关于“如何学习自然语言处理”,有很多同学通过不同的途径留过言,这方面虽然很早之前写过几篇小文章:《如何学习自然语言处理》和《几本自然语言处理入门书》,但是更推崇知乎上这个问答:自然语言处理怎么最快入门,里面有微软亚洲研究院周明老师的系统回答和清华大学刘知远老师的倾情奉献:初学者如何查阅自然语言处理(NLP)领域学术资料,当然还包括其他同学的无私分享。

不过,对于希望入门NLP的同学来说,推荐你们先看一下这本书: Speech and Language Processing,第一版中文名译为《自然语言处理综论》,作者都是NLP领域的大大牛:斯坦福大学 Dan Jurafsky 教授和科罗拉多大学的 James H. Martin 教授。这也是我当年的入门书,我读过这本书的中文版(翻译自第一版英文版)和英文版第二版,该书第三版正在撰写中,作者已经完成了不少章节的撰写,所完成的章节均可下载:Speech and Language Processing (3rd ed. draft)。从章节来看,第三版增加了不少和NLP相关的深度学习的章节,内容和篇幅相对于之前有了更多的更新:

 

Chapter

Slides

Relation to 2nd ed.

1:

Introduction

 

[Ch. 1 in 2nd ed.]

2:

Regular Expressions, Text Normalization, and Edit Distance

Text [pptx] [pdf]

Edit Distance [pptx] [pdf]

[Ch. 2 and parts of Ch. 3 in 2nd ed.]

3:

Finite State Transducers

4:

Language Modeling with N-Grams

LM [pptx] [pdf]

[Ch. 4 in 2nd ed.]

5:

Spelling Correction and the Noisy Channel

Spelling [pptx] [pdf]

[expanded from pieces in Ch. 5 in 2nd ed.]

6:

Naive Bayes Classification and Sentiment

NB [pptx] [pdf]

Sentiment [pptx] [pdf]

[new in this edition]

7:

Logistic Regression

8:

Neural Nets and Neural Language Models

9:

Hidden Markov Models

 

[Ch. 6 in 2nd ed.]

10:

Part-of-Speech Tagging

 

[Ch. 5 in 2nd ed.]

 

11:

Formal Grammars of English

 

[Ch. 12 in 2nd ed.]

12:

Syntactic Parsing

 

[Ch. 13 in 2nd ed.]

13:

Statistical Parsing

14:

Dependency Parsing

 

[new in this edition]

 

15:

Vector Semantics

Vector [pptx] [pdf]

[expanded from parts of Ch. 19 and 20 in 2nd ed.]

16:

Semantics with Dense Vectors

Dense Vector [pptx] [pdf]

[new in this edition]

17:

Computing with Word Senses: WSD and WordNet

Intro, Sim [pptx] [pdf]

WSD [pptx] [pdf]

[expanded from parts of Ch. 19 and 20 in 2nd ed.]

18:

Lexicons for Sentiment and Affect Extraction

SentLex [pptx] [pdf]

[new in this edition]

 

19:

The Representation of Sentence Meaning

20:

Computational Semantics

21:

Information Extraction

 

[Ch. 22 in 2nd ed.]

22:

Semantic Role Labeling and Argument Structure

SRL [pptx] [pdf]

Select [pptx] [pdf]

[expanded from parts of Ch. 19 and 20 in 2nd ed.]

23:

Neural Models of Sentence Meaning (RNN, LSTM, CNN, etc.)

 

24:

Coreference Resolution and Entity Linking

25:

Discourse Coherence

 

26:

Seq2seq Models and Summarization

27:

Machine Translation

28:

Question Answering

29:

Conversational Agents

30:

Speech Recognition

31:

Speech Synthesis

另外该书作者之一斯坦福大学 Dan Jurafsky 教授曾经在Coursera上开设过一门自然语言处理课程:Natural Language Processing,该课程目前貌似在Coursera新课程平台上已经查询不到,不过我们在百度网盘上做了一个备份,包括该课程视频和该书的第二版英文,两个一起看,效果更佳:

2018.3 更新:链接: https://pan.baidu.com/s/1Wp35AyHY1PrmisA4deoC6Q 密码: sps4

对于一直寻找如何入门自然语言处理的同学来说,先把这本书和这套课程拿下来才是一个必要条件,万事先有个基础。