What is NLTK?

NLTK stands for Natural Language Toolkit, it is a platform used for building Python programs to work with humans' natural language data.

By "natural language" we mean a language that is used for everyday communication by humans; languages like English, Hindi or Portuguese.

In contrast to artificial languages such as programming languages and mathematical notations, natural languages have evolved as they pass from generation to generation, and are hard to pin down with explicit rules.

Natural Language Processing — or NLP for short —, in a wide sense, covers any kind of computer manipulation of natural language. At one extreme, it could be as simple as counting word frequencies to compare different writing styles. At the other extreme, NLP involves "understanding" complete human utterances, at least to the extent of being able to give useful responses to them.

Technologies based on NLP are becoming increasingly widespread. For example, phones and handheld computers support predictive text and handwriting recognition; web search engines give access to information locked up in unstructured text; machine translation allows us to retrieve texts written in Chinese and read them in Spanish; text analysis enables us to detect sentiment in tweets and blogs. By providing more natural human-machine interfaces, and more sophisticated access to stored information, language processing has come to play a central role in the multilingual information society.

Natural Language Toolkit (NLTK)

NLTK was created in 2001 as part of a computational linguistics course in the Department of Computer and Information Science at the University of Pennsylvania. Since then it has been developed and expanded with the help of dozens of contributors. It has now been adopted by courses in dozens of universities, and serves as the basis of many research projects.

Language processing tasks and corresponding NLTK modules with examples of functionality

Language processing taskNLTK modulesFunctionality
Accessing corporacorpusstandardized interfaces to corpora and lexicons
String processingtokenize, stemtokenizers, sentence tokenizers, stemmers
Collocation discoverycollocationst-test, chi-squared, point-wise mutual information
Part-of-speech taggingtagn-gram, backoff, Brill, HMM, TnT
Machine learningclassify, cluster, tbldecision tree, maximum entropy, naive Bayes, EM, k-means
Chunkingchunkregular expression, n-gram, named-entity
Parsingparse, ccgchart, feature-based, unification, probabilistic, dependency
Semantic interpretationsem, inferencelambda calculus, first-order logic, model checking
Evaluation metricsmetricsprecision, recall, agreement coefficients
Probability and estimationprobabilityfrequency distributions, smoothed probability distributions
Applicationsapp, chatgraphical concordancer, parsers, WordNet browser, chatbots
Linguistic fieldworktoolboxmanipulate data in SIL Toolbox format

NLTK was designed with four primary goals in mind:

Simplicity:To provide an intuitive framework along with substantial building blocks, giving users a practical knowledge of NLP without getting bogged down in the tedious house-keeping usually associated with processing annotated language data
Consistency:To provide a uniform framework with consistent interfaces and data structures, and easily guessable method names
Extensibility:To provide a structure into which new software modules can be easily accommodated, including alternative implementations and competing approaches to the same task
Modularity:To provide components that can be used independently without needing to understand the rest of the toolkit

Source:

https://www.nltk.org/book/ch00.html