Foundations Of Statistical Nlp
by Kavita Ganeshan
Notes from the book “Foundations of Statistical Natural Language Processing” By Manning and Schutze
I thought of sharing my notes from this classical book of NLP. I really enjoy the examples, quotes and narration used in this book. It takes you through the absolute basics of probability and linguistics, before entering into complex modelling for language.
Preliminaries
- Questions relevant to Linguistics
- What kind of things do people say?
- What do these things say/ask/request about world?
- Lexical resources
- Brown Corpus (American English)
- Lancaster Oslo Bergen (British English)
- Susanne Corpus (130000 subset of Brown)
- Penn Treebank (Wall Street Journal articles)
- Canadian Hansards (Canadian Parliament Proceedings - Bilingual Corpus)
- Wordnet ( Dictionary, Hierarchy of synset of words, meronymy- part:whole relations)
- Zipf law ( Principle of least effort )
- f.r=k { f: frequency, r: rank (position in list), k:constant}
- Number of meanings of word m \alpha \sqrt{f}
- Collocation
- Phrasal verbs, compound nouns, idioms
- frequent bigrams + particular pos pattern ( this has noise like “next year”)
- Concordance
- KWIC - Keyword in Context
- Verb frames
Subscribe via RSS