This year
parcon 0.1.25 : Python Package Index
Parcon is a parser library. It can be used for parsing both normal text and binary data. It's designed to be easy to use and to provide informative error messages.
2011
introducing esprima: blazing-fast javascript parser | don't code today
In a nutshell, Esprima (esprima.org) is a JavaScript parser written in pure JavaScript.
neilkodner.com / An analysis of Steve Jobs tribute messages displayed by Apple
Sitepatching - YouTube, Twitter, more Hotmail
PATCH-444, Make Twitter hashtags visible. Twitter does some script magic and ends up with broken element nesting: <span><strong>foo</span></strong> in turn causing Opera's layout engine to be upset. This will be fixed with the new parser so we'll just patch it meanwhile.
microdata.py at master from edsu/microdata - GitHub
python library for extracting html5 microdata
the EnAKTing blog › Fast SPARQL XML Results Parser in Python
re-wrote our original SPARQL XML results parser to use Expat, the non-validating (and fast) XML parser.
2010
Create your own textformat and parse it
implemented the parser as a simple state machine with no syntax tolerance
Sam Ruby: Scoping out a C++ HTML5 parser
As someone who attempted to keep an implementation of an HTML5 parser up-to-date for some period of time last year, I will say it’s time-consuming, thankless work, especially since the spec was changing a lot. Now that things have settled down a bit, it might be good to start making full implementations.
php-excel-reader - Project Hosting on Google Code
Extractomatic
Extractomatic is a simple API to detect and remove surplus clutter (such as adverts, headers, footers) around the main content of a web page. It uses the Boilerplate Java library, by Christian Kohlschütter.
jParser and jTokenizer released | Web 2.1
2009
enriquepablo / nl / wiki / Home — bitbucket.org
nl is a python library, that exposes a declarative API that allows us to build sentences and rules. These are used as input for a knowledge base built on the CLIPS production system. CLIPS builds a Rete network with the rules and sentences, which can then be queried for the consecuences of those in a most efficient way.
The main claim of nl is to offer a syntax that can accommodate any coherent theory that we may build with the natural language (in the same sense as something like the semantic web's OWL-Full would), while at the same time being based on a simple finite domain first order theory. This theory is NL, a discussion of which can be found here. This discussion is probably required reading to understand the breadth and the limits of nl, but not to start using it.
ry's http-parser at master - GitHub
This is a parser for HTTP messages written in C. It parses both requests and responses. The parser is designed to be used in performance HTTP applications. It does not make any allocations, it does not buffer data, and it can be interrupted at anytime. It only requires about 128 bytes of data per message stream (in a web server that is per connection).








