December 2010
Create your own textformat and parse it
implemented the parser as a simple state machine with no syntax tolerance
Site Perceptive » A Book About (Python) Books
The first element I had to come up with was a 4-5 page summary of what I intended to cover. Being as this was a beginner’s guide to text processing, I elected categories such as XML, PyParsing, Nucular, and some introductory material to the Python IO system
November 2010
July 2010
davedash's textcluster at master - GitHub
This is a memory intensive app that stores documents in a corpus and uses an inverted index to group objects together.
September 2009
Natural Language Processing with Python - O'Reilly Media
This book offers a highly accessible introduction to Natural Language Processing, the field that underpins a variety of language technologies ranging from predictive text and email filtering to automatic summarization and translation. You'll learn how to write Python programs to analyze the structure and meaning of texts, drawing on techniques from the fields of linguistics and artificial intelligence.
February 2009
A Lexical Analyzer for HTML and Basic SGML
a self-contained specification of a lexical analyzer that uses automated parsing techniques to handle SGML document types limited to a tractable set of SGML features. An implementation is available as well.
TextSTAT - Simple Text Analysis Tool
TextSTAT is a concordance program which was designed to be user friendly and provide simple Internet functionality. Texts can be combined to form corpora (which can also be stored as such). The program analyses these text corpora and displays word frequency lists and concordances to search terms. The program is written in Python and offered here as a Windows program. TextSTAT is freeware.
Text Analysis Info - Information retrieval software
Programs listed here can be divided into more subtle groups:
* pure information retrievers: searching and displaying texts, indexers
* concordancers: programs providing concordances
Content Analysis in Python
The scripts presented here are not intended to teach programming; I assume you have at least a vague idea about that already. Nor are they intended to exemplify fine coding style. The point is to show how easy things can be, if you pick the right tools.
The OpenNLP Homepage
OpenNLP is an organizational center for open source projects related to natural language processing. Its primary role is to encourage and facilitate the collaboration of researchers and developers on such projects.
montylingua :: a free, commonsense-enriched natural language understander
MontyLingua is a free*, commonsense-enriched, end-to-end natural language understander for English. Feed raw English text into MontyLingua, and the output will be a semantic interpretation of that text. Perfect for information retrieval and extraction, request processing, and question answering. From English sentences, it extracts subject/verb/object tuples, extracts adjectives, noun phrases and verb phrases, and extracts people's names, places, events, dates and times, and other semantic information.
January 2009
Joseph Wilk » Latent Semantic Analysis in Python
Latent Semantic Analysis (LSA) is a mathematical method that tries to bring out latent relationships within a collection of documents. Rather than looking at each document isolated from the others it looks at all the documents as a whole and the terms within them to identify relationships.
NLTK Home (Natural Language Toolkit)
Open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of NLP tasks
October 2005
Configuring the Universal Text Parser
The Universal Text Parser enables you to link an external data source with the Meta-Directory join engine. With the Universal Text Parser, you can synchronize a wide variety of text-based data with your Meta-Directory views.

