Data Science, Machine Learning, Deep Learning are the new buzz words recently.

There are lot of MOOCs and online courses / certifications available for these topics as well. I have always worked on Text and wanted to enroll in such course with Natural Language Processing in focus. Having gone through the syllabus and contens of few courses online, I felt the need to create a curriculum of my own. This is because the online material available is all disperse and I ended up going away from the course to gather my knowledge. So this is my attempt in educating / updating my NLP knowledge. Feel free to use this and modify it to your needs.

Major Topics that I need to work on:

  1. Programming - Python, PyTorch
  2. Math - Linear Algebra, Probability, Statistics
  3. NLP - Linguistics, Statistical NLP, Deep NLP
  4. Data Science - Pandas
  5. MOOC - Andrew Ng, Stanford NLP, Oxford Deep Mind Lectures, EdX, Fast.ai Reference - Jason Brownlee, Dan Jurafsky Speech & Language Processing book, Cracking the coding interview

I am going to give myself about 8 months to finish my own curriculum and the test is I come up with my own Project implementation of something interesting in NLP using Deep learning (more like a thesis if possible). I will grade myself and I must say I am my worst critic. So trust me, this is a difficult assignment!

I will be updating this list as and when I find something new to add to the list.

  1. Python:
    1. Generators
    2. Vectorization
    3. Data Structures
      1. Numpy structures and their implementation
      2. Scikit Learn structures and their implementation
    4. Algorithms
      1. Indexing and searching in dictionary in python
      2. Interview cake implementation for best time and space complexity
    5. Matrix assignment
  2. Linear Algebra:
    1. Matrix Vectors
    2. Tensors
  3. Probability:
    1. Conditional Probability - Heads Tails - Questions for interviews
  4. Statistics:
    1. Definitions, Metrics
    2. correlation
    3. Distributions
  5. Statistical NLP:
    1. Vectorizer / Transformer - Scikit Learn
    2. HMM
    3. CRF
    4. Topic Modelling - LDA
    5. Sequence labelling
    6. Feature selection
    7. Dimensionality reduction - PCA, ICA
  6. Deep NLP:
    1. Word2vec - CBOW/ Skip gram
    2. Representation Learning
    3. CNN for text
    4. RNN for text
    5. Attention model for text
    6. Pytorch for text
    7. BERT / Transformers - hugging face
  7. Linguistics:
    1. Discourse Segmentation
  8. Pandas:
    1. Dataframe manipulations
  9. MOOC:
    1. Andrew Ng - CNN
    2. Stanford NLP - Richard Socher
    3. Oxford
    4. Fast.ai - Rachel Thomas