The Python and the Elephant: Large Scale Natural Language Processing with NLTK and Dumbo (#120)
The Python and the Elephant: Large Scale Natural Language Processing with NLTK and Dumbo
Presented by Nitin Madnani (University of Maryland, College Park); Dr. Jimmy J Lin (University of Maryland)
A practical look at NLTK and Dumbo, python-powered and open-source toolkits and APIs for processing natural language on a large scale.
For people like us who make a living trying to make a computer "understand" human language, Python is a very powerful language, given its rapid prototyping abilities, native unicode support and a stellar standard library. This relationship has been strengthened further by an open-source, python- based Natural Language ToolKit (www.nltk.org) which is being widely used in the community for both teaching and research purposes and gaining traction in the general Python community as well (www.nltk.org/book). Recently, the Python community has seen the release of Dumbo (http://wiki.github.com/klbostee/dumb o), an open-source, python-based cloud-computing API (based on Hadoop) via the hands of Klaas Bosteels.
In this talk, we show how the amalgamation of Python, NLTK and Dumbo can allow for very large-scale natural language processing efficiently and elegantly.