Jump to ratings and reviews
Rate this book

Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit

Rate this book
This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation. With it, you'll learn how to write Python programs that work with large collections of unstructured text. You'll access richly annotated datasets using a comprehensive range of linguistic data structures, and you'll understand the main algorithms for analyzing the content and structure of written communication.

Packed with examples and exercises, Natural Language Processing with Python will help



This book will help you gain practical skills in natural language processing using the Python programming language and the Natural Language Toolkit (NLTK) open source library. If you're interested in developing web applications, analyzing multilingual news sources, or documenting endangered languages -- or if you're simply curious to have a programmer's perspective on how human language works -- you'll find Natural Language Processing with Python both fascinating and immensely useful.

502 pages, Paperback

First published January 1, 2009

Loading interface...
Loading interface...

About the author

Steven Bird

3 books44 followers

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
204 (36%)
4 stars
226 (40%)
3 stars
101 (18%)
2 stars
20 (3%)
1 star
3 (<1%)
Displaying 1 - 30 of 49 reviews
Profile Image for Manny.
Author 34 books14.9k followers
January 30, 2014
[Editor's preface to the second edition: notgettingenough read the first edition of this review and complained that it was all Geek to her. I have amended it accordingly]
POLONIUS: What do you read, my lord?
HAMLET: Words, words, words.
Hamlet was evidently interested in textual analysis, and if the Python Natural Language Toolkit (NLTK) had been available in Elsinore I'm sure he'd have bought this book too. I'd heard good things about it, and it doesn't disappoint: the authors have done a terrific job of combining a lot of freeware tools and resources into a neat package.

They say they want it to be accessible even to people who have no software development experience; this may be just a little optimistic, but try it for yourself and see what you think. They've certainly made every effort to get you hooked from the beginning. Ten minutes after downloading the software, I was able to produce a randomized version of Monty Python and the Holy Grail with a single command:
>>>Python 2.6.6 
>>> import nltk
>>> nltk.download()
>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
>>> text6.generate()
Building ngram index...
SCENE 1 : Well , I see . Running away , And his nostrils
raped and his bottom burned off , And his pen -- SIR ROBIN
: We are just not used to handsome knights . Nay . Nay .
Come on . Anybody armed must go too . OFFICER # 1 : No .
Not only by surprise . Not only by surprise . Not the
Knights Who Say ' Ni '. KNIGHTS OF NI : Ni ! ARTHUR :
You know much that is . Yeah , a swallow ' s got a point .
SOLDIER #>>>
So what else can it do? Geeks may want to skip to the example below, but here's a brief summary. The toolkit contains three kinds of materials. First, there's a well-selected set of texts, packaged up so that they can easily be used. Some of them are listed above; there are a couple of dozen more that you can quickly locate.

Second, there's a bunch of tools which you can use to analyze the texts. For example, there's an interface to WordNet, which is a kind of digitized super-thesaurus containing tens of thousands of words and concepts, all neatly arranged into a complex hierarchy with the most general concepts at the top and the most specific ones at the bottom. There's a tool called a "part-of-speech tagger", which takes a pieces of text and guesses the part of speech - noun, verb, adjective, etc - for each word in the context in which it appears. There are "parsers", which can analyze sentences in terms of grammatical function - finding subjects, objects, main verbs, and so on. And there are plenty of other things, in particular easy ways to incorporate machine learning methods, which you can train yourself by giving them examples.

Third, there's Python itself, which is the glue that sticks all these things together. I'd somehow never used Python before, but it's a concise and elegant language that's easy to learn if you already have some software skills. If you know Perl, Ruby, Unix shell-scripting, or anything like that, you'll be up and flying in no time. You can write scripts which are just a few lines long, but which do a whole lot of stuff: read a file from the web, chop it up into individual words and sentences, find all the sentences that have some particular property you're searching for, and then display everything as a neat table or graph.

The rest of the review will probably only be interesting to geeks, but if that's you, please read on...
________________________________

I finished the book yesterday, and I've just spent a few hours messing around writing little scripts to see what it can do. Here's the most entertaining one. I thought it would be interesting to be able to locate all the words in a text that refer to animals. NLTK includes a handy interface to WordNet, so the first job was to write a function which checks whether a word could refer to a concept lower in the hierarchy than the one for 'animal'. It's never quite as easy as you first think; after a little experimentation, I realized that I had to block words which referred to animals only by virtue of referring to human beings. The final definition looks like this:
animal_synset = wn.synset('animal.n.01')
human_synset = wn.synset('homo.n.02')

def is_animal_word(word):
hypernyms = [ hyp
for synset in wn.synsets(word)
for path in synset.hypernym_paths()
for hyp in path
if not human_synset in path]
return animal_synset in hypernyms
I then wrote a script which called my function to return all the animal words in the first n words of a piece of text:
def print_animal_words_v1(text, n):
words = set([w.lower() for w in text[:n]])
animal_words = sorted(set([w for w in words
if is_animal_word(w)]))
print "Animal words in first %d words" % n
print(animal_words)
They've packaged up a bunch of textual resources for easy access, so I could immediately test it on the first 50,000 words of Emma:
>>> emma = gutenberg.words('austen-emma.txt')
>>> print_animal_words_v1(emma, 50000)
Animal words in first 50000 words
['baby', 'bear', 'bears', 'blue', 'chat', 'chicken',
'cow', 'cows', 'creature', 'creatures', 'does',
'entire', 'female', 'fish', 'fly', 'games', 'goose',
'head', 'horse', 'horses', 'imagines', 'kite',
'kitty', 'martin', 'martins', 'monarch', 'mounts',
'oysters', 'pen', 'pet', 'pollards', 'shark',
'sharks', 'stock', 'tumbler', 'young']
A quick look at this reveals some suspicious candidates: for example, 'does' is most likely never used as the plural of 'doe', so shouldn't be counted as an animal word.

My second version of the script called another resource, a "tagger", which quickly goes through the text and tries to guess what part of speech each word is in the context in which it appears. I only look at the words whose tags start with an 'N', indicating that they have been guessed as nouns:
def print_animal_words_v2(text, n):
print "Tagging first %d words" % n
tagged_words = nltk.pos_tag(text[:n])
print("Tagging done")
words = set([w.lower() for (w, tag) in tagged_words
if tag.startswith('N')])
animal_words = sorted(set([w for w in words
if is_animal_word(w)]))
print "Animal words in first %d words" % n
print(animal_words)
Now I get a shorter list, which in particular omits the suspicious 'does':
>>> print_animal_words_v2(emma, 50000)
Tagging first 50000 words
Tagging done
Animal words in first 50000 words
['baby', 'bears', 'blue', 'chicken', 'cow',
'creature', 'creatures', 'female', 'games', 'goose',
'head', 'horse', 'horses', 'kitty', 'martin',
'martins', 'monarch', 'oysters', 'pet', 'pollards',
'shark', 'sharks', 'stock', 'tumbler', 'young']
Well, that should be enough to give you the favor of the thing. If you don't want to buy the book, it's available free online here. Have fun!
Profile Image for Ji.
167 reviews47 followers
February 15, 2018
I'm sure I'll come back to this book again, and again. It's a truly good one - not only I got to start learning the basics of text mining in Python using nltk, but also I learned some basics of Python data processing ideas and routines. It's a shame that with my limited knowledge so far, I can only devour very little value out of reading it. Definitely a five star!
Profile Image for David.
Author 18 books371 followers
December 4, 2011
Excellent intro to both Python programming and NLP. Assumes no prior familiarity with either, so this is a good book both for beginning CS students who know little to nothing about linguistics, and for beginning linguists who have no programming experience.
Profile Image for Muhammad al-Khwarizmi.
123 reviews35 followers
October 13, 2016
Really very decent introduction to this particularly library. A big caveat is that it contains a fairly large number of typos, even really obvious ones like "ibigrams", and includes some code that no longer functions with the current iteration of NLTK. Another issue, at least from my point of view, is that it isn't really geared towards a streamlined approach to using NLTK as a tool for text mining, with its numerous digressions into how to use Python for those entirely unfamiliar with it, as a consequence of which I am now reading the NLTK 3 Cookbook by Packt, and if you do already know Python, you are going to find a good deal of this material tedious and distracting. In addition, though this isn't really a fault as such, the orientation is more towards linguistic research than towards just getting certain things like categorization of text done. All told, it still has my recommendation, easily.
Profile Image for Lucas.
150 reviews30 followers
February 18, 2019
Esse livro é o melhor material que encontrei sobre o pacote NLTK no python. Esse pacote não é muito intuitivo se você está acostumado a trabalhar com dados estruturados, mas o autor apresenta de uma forma que fica realmente fácil de entender. O livro foi escrito para pessoas que nunca programaram em python, então tem algumas seções explicando coisas bem introdutórias como listas e dicionários. Apesar disso o livro apresenta alguns conteúdos complexos.
O objetivo do livro é mostrar como você pode transformar textos comuns em dados estruturados e analisá-los com técnicas de machine learning. A parte de estruturação é a mais simples e cobre os 5 primeiros capítulos. A maior complexidade do livro fica justamente na parte mais interessante que cobre a extração de informações do texto.
Eu li apenas 7 dos 12 primeiros capítulos. Eu parei pois ainda não vi como posso usar essas técnicas no meu trabalho e acabei perdendo um pouco a motivação. Possivelmente voltarei ao tópico no futuro breve.
Profile Image for Shubhang Goswami.
17 reviews2 followers
April 22, 2018
A guide book on the NLTK toolkit that allows you to dissect language and make a computer understand language.

The authors build up from very simple models to complex ones as the book progresses, clearly laying down a story in front of us.

If you are well verse with python you can skip the first 4 chapters and head straight to chapter 5. I liked this book and it gave me an idea of how google assistant might parse data when I ask it to do something for me. Makes you look at language differently.

Would recommend!
Profile Image for Alex.
558 reviews40 followers
March 6, 2018
A solid resource that feels a bit overdue for an update. Much of the core linguistic information is still relevant I am sure, but there is no coverage of newer ML techniques (word embeddings, RNNs, etc.), and the Python NLTK library itself has been updated extensively enough since this was published to render many of the code examples incompatible with the current version. For those pursuing grammar-based approaches to NLP, though, this should still prove a useful reference.
Profile Image for Rob Young.
3 reviews11 followers
December 20, 2010
This is a fascinating book. Everything from text processing to statistics to lexical analysis. For most problems the solution is shown in both set theory notation and python making it much easier for a programmer to understand the theory.
Profile Image for Lujain.
12 reviews7 followers
May 21, 2011


قرأت بعض الفصول منه مجبرة لامتحاني يوم الغد :(
أسلوب الكتاب سلس جداً ..
Profile Image for Tim O'Hearn.
263 reviews1,169 followers
December 27, 2021
This subject fascinates me and I worked through this book front to back as quickly as I could. This is a complicated topic, and though the earlier chapters can inform shallow solutions that will certainly impress, the problem space is vast. There are a few reviews out there claiming that the book is "outdated"--please do not let that stop you from picking it up. It's an excellent in-depth manual for Python NLP practitioners

Warning: if you haven't been exposed to a typical CS undergrad course load, the curve in the mid-late chapters will be steep for independent learners. Classes you'll ideally have taken to get the most out of this book: Discrete Structures (notation), Data Structures & Algorithms, Intro to Machine Learning, Programming Languages / Compilers (grammars). Also, spending an hour or two reviewing basic English grammar and language constructs will help.

You should also bring with you several imagined use cases. In the later chapters, the material can get pretty dense, and when I encountered methods that didn't obviously apply to my imagined use cases, it was much more difficult (I say--nearly impossible at first read) to pick up some of the more abstract concepts.
Profile Image for Nedret Efe.
14 reviews3 followers
July 29, 2020
Good intro to linguistics, NLP, and the open source NLTK python toolkit
Profile Image for Dave Peticolas.
1,377 reviews41 followers
October 8, 2014
A wonderful introduction to natural language processing using the NLTK toolkit. This book also serves as an introduction to Python for those new to the language (and to programming, though the pace is pretty fast). And for those like me, not new to Python but totally ignorant of NLP, it contains a wealth of interesting material. Finally, the examples showcase the elegance of Python as a language for text processing.
5 reviews
June 22, 2017
This excellent work moves the novice from the basics of natural language processing (NLP) into advanced topics with dexterity. NLP is the discipline of interpreting language as humans produce it with computational tools that computers require. By the time you complete this book you will be able to incorporate NLP into your data science workflow.

There are several things I love about this book. It strikes an excellent balance between theory and applications. It provides compelling use cases along with the actual code needed to resolve those use cases. And it has an intuitive organization starting with low-level, concrete topics, to high-level, abstract topics. That is, it covers:

+ processing raw text (managing characters with unicode, pattern matching with regexes, standardizing text)
+ categorizing single words (determining their parts of speech, semantically classifying them based on linguistic context)
+ processing groups of words (mapping out entity synonymity, and other semantic relations)
+ constructing grammars(my god, this was very difficult)
+ processing sentence meaning (by querying databases to return human-ineligible responses)

Sprinkled throughout, the writers expose you to lessons in writing scalable code, machine learning, and other computer science topics relevant to NLP. Even though all examples are in Python, focusing on the Natural Language Processing Toolkit library, this book will trace the roadmap that every NLP project will take. Awesome.

This review and others here! http://www.autopoesis.tech/bookshelf/
Author 3 books3 followers
April 16, 2019
Goes into detail on a lot of different topics that can be relevant when processing natural language like tokenization, parts of speech tagging, grammars, logic, data formats and machine learning aspects, including many excercises. Sometimes it feels like it spans too many areas, from programming beginner chapters to more complex topics. I have to admit that reading was painful from time to time - it's a quite long book. But even though I considered giving only 3 stars (mainly due to the large amount of Python and the feeling that nltk is a bit dated) it's a very interesting read. You can consider its many topics a strength or a weakness - not everything will be relevant to you but I have to say that I found most of it very interesting.
105 reviews46 followers
January 26, 2018
The books even though lost generality as a whole but still shows the working of NLP algorithms much clearly then other website or blogs, it certainly give me a better vision of natural language processing. The approach of this book was to state and explain with a hell lot of examples and it was indeed a good idea. The last 2 chapters were really complex and hard to understand but I'm sure if I'll read those topics again and the insight gained in this book will help me a lot in it. If you are also new to NLP and want to see it's applications through the programming language Python, give it a read for sure.
Profile Image for Relax, you're doing fine.
73 reviews25 followers
April 8, 2019
Một quyển sách hay về xử lý ngôn ngữ tự nhiên. Tuy nhiên đây là sách thiên về phần giảng giải các bước để nghiên cứu cách xử lí ngôn ngữ tự nhiên từ phân tách từ, câu, đến mức cao hơn là biểu diễn các thành phần của câu để xử dụng các ngôn ngữ logic nhằm giúp máy hiểu được ngữ nghĩa của câu.

Sách viết đầy đủ, ít thiên về toán, có thể dùng cho cả người chưa biết Python.

Tuy nhiên, nếu bạn cần sách để ứng dụng NLTK vào các bài toán thực tế như phân loại đề tài, hay biểu cảm (senmatic) ... đây không phải là sách bạn cần.
Profile Image for Jesús Navarro.
24 reviews
May 12, 2017
A very clear, simple and comprehensive book on the fundamentals of NLP, the techniques and theory involved, with practical examples. Covers tokenization, tagging, parsing, information extraction, classification, syntactic and semantic analysis.
Profile Image for Wasim Khan.
28 reviews8 followers
March 20, 2018
This book introduces both Natural Language Processing Toolkit and Natural Language Processing and it's a good book at that. Both theory and code examples are thrown in good measure. It's a must if you want to have NLP concepts before jumping to NLP packages.
Profile Image for Alfia.
69 reviews
August 31, 2019
Accessible intro to NLP concepts and practice in Python 2.7. The print version includes some deprecated code so be sure to check the Wiley site and Stack Overflow for updates. Not sure when the ereader version was last updated but it is moreso than the print.
Profile Image for Synaps.
66 reviews10 followers
September 8, 2020
The now ubiquitous science of "natural language processing", whose applications include translation, autocorrect systems, search engines, chat bots, and much more, is here explained in a uniquely clear and pedagogic approach, requiring only a beginner's understanding of Python.
1 review
May 9, 2021
Really comprehensive overview of NLP techniques and methodology, as well as a look into the NLTK. Not a great reference text, but very good for getting an awareness of how to solve a wide range of NLP problems.
Profile Image for Danielle.
370 reviews4 followers
August 13, 2021
At this point, a lot of the content in this book is outdated. However, it is a clear read, and it covers the fundamentals. Now I must go on to find a more recent and specialized book on information extraction.
Profile Image for Charlotte.
79 reviews27 followers
April 8, 2022
☺️ great Book to process data, build Text corpora and run NLP analyses. Except for basic python knowledge no prior knowledge is needed. It’s fun and not dry at all
Profile Image for Laura.
4 reviews
July 25, 2019
I really liked it and recommend it for writers at all levels.
Displaying 1 - 30 of 49 reviews

Can't find what you're looking for?

Get help and learn more about the design.