PUBLIC   marks

PUBLIC MARKS with tag taln

Sponsorised links

This year

Benoît Sagot - WOLF

by parmentierf
Le WOLF (Wordnet Libre du Français) est une ressource lexicale sémantique (wordnet) libre pour le français.

Proxem > Home

by parmentierf (via)
Proxem isn't just a technology. It's a company and a vision informed by creativity and passion for all things possible with Natural Language Processing.

Cypher - Beta Release — monrai.com - ai - ai software - semantic web - semantic web software - ai company - natural language processing - natural language processing software - RDF, FOAF, Friend of a Friend, DC, Dublin Core, RSS, SeRQL and SPARQL softwa

by parmentierf (via)
The Cypher™ beta release is the AI software program available which generates the .rdf (RDF graph) and .serql (SeRQL query) representation of a plain language input, allowing users to speak plain language to update and query databases. With robust definition languages, Cypher's grammar and lexicon can quickly and easily be extended to process highly complex sentences and phrases of any natural language, and can cover any vocabulary. Equipped with Cypher, programmers can now begin building next generation semantic web applications that harness what is already the most widely used tool known to man - natural language.

Sponsorised links

2007

Double Metaphone - Wikipédia

by parmentierf (via)
Le Double Metaphone est un algorithme de recherche phonétique écrit par Lawrence Philips et est la deuxième génération de l'algorithme Metaphone. Son implémentation a été décrite en juin 2000 dans le magazine C/C Users Journal. Il est appelé « Double » car il peut retourner un code primaire et secondaire pour une chaîne de caractères (String) ; cela compte pour des cas ambigus ou pour des variantes multiples avec des ascendances communes. Par exemple, l'encodage du nom « Smith » rapporte le code primaire SM0 et le code secondaire XMT, lorsque le nom « Schmidt » rapporte le code primaire XMT et le code secondaire de SMT ; les deux ont XMT en commun.

TEL :: [tel-00145147, version 1] Définitions et caractérisations de modèles à base d'analogies pour l'apprentissage automatique des langues naturelles

by parmentierf
Définitions et caractérisations de modèles à base d'analogies pour l'apprentissage automatique des langues naturelles

2006

Présentation de Theuth et de Blue Moon

by parmentierf (via)
Présentation de Theuth et de Blue Moon. Un nouveau type d'algo de parsing, dit "asyntagmatique". Sans entrer dans les détails, le fait que le parsing soit asyntagmatique débloque tout : on peut désormais tenir compte des contextes, comprendre les déictiques, détecter les jeux de mots et les contrepèteries, reconnaître la langue d'un texte ou traduire des textes où plusieurs langues sont mélangées, y compris dans la même phrase.

Official Google Research Blog: All Our N-gram are Belong to You

by parmentierf & 1 other (via)
Here at Google Research we have been using word n-gram models for a variety of R&D projects, such as statistical machine translation, speech recognition, spelling correction, entity detection, information extraction, and others. While such models have usually been estimated from training corpora containing at most a few billion words, we have been harnessing the vast power of Google's datacenters and distributed processing infrastructure to process larger and larger training corpora. We found that there's no data like more data, and scaled up the size of our data by one order of magnitude, and then another, and then one more - resulting in a training corpus of one trillion words from public Web pages.

Python for Linguistics - py4lx

by parmentierf & 1 other (via)
This is an ickle collection of tutorials on using Python for doing interesting stuff with (human!) languages. They are posted initially on Hacklog, the Blogamundo developer blog, and then moved here, where they are endlessly tweaked to remove embarrassing errors improve clarity. In theory they should be doable by folks with no programming background, or just a little.

Chatterbots, Tinymuds, and the Turing Test

by parmentierf (via)
This paper describes the development of one such Turing System, including the technical design of the program and its performance on the first three Loebner Prize competitions. We also discuss the program's four year development effort, which has depended heavily on constant interaction with people on the Internet via Tinymuds (multiuser network communication servers that are a cross between role-playing games and computer forums like CompuServe). Finally, we discuss the design of the Loebner competition itself, and address its usefulness in furthering the development of Artificial Intelligence.

Charming Python: Get started with the Natural Language Toolkit

by parmentierf (via)
In this installment, David introduces you to the Natural Language Toolkit, a Python library for applying academic linguistic techniques to collections of textual data. Programming that goes by the name "text processing" is a start; other capabilities for syntactic and even semantic analysis are further specialized to studying natural languages.

L.Pointal - Python

by parmentierf & 8 others (via)
Page Python de Laurent Pointal

Signes_Roche.pdf (objet application/pdf)

by cfrancois (via)
extrcation de terminologie par apprentissage supervisé

2005

SCIgen - An Automatic CS Paper Generator

by parmentierf & 4 others (via)
SCIgen is a program that generates random Computer Science research papers, including graphs, figures, and citations. It uses a hand-written context-free grammar to form all elements of the papers. Our aim here is to maximize amusement, rather than coherence. One useful purpose for such a program is to auto-generate submissions to "fake" conferences; that is, conferences with no quality standards, which exist only to make money. A prime example, which you may recognize from spam in your inbox, is SCI/IIIS and its dozens of co-located conferences (for example, check out the gibberish on the WMSCI 2005 website). Using SCIgen to generate submissions for conferences like this gives us pleasure to no end. In fact, one of our papers was accepted to SCI 2005! See Examples for more details.

EuroWordNet:Building a multilingual database with wordnets for several European languages.

by parmentierf (via)
EuroWordNet is a multilingual database with wordnets for several European languages (Dutch, Italian, Spanish, German, French, Czech and Estonian). The wordnets are structured in the same way as the American wordnet for English ( Princeton WordNet, Miller et al 1990) in terms of synsets (sets of synonymous words) with basic semantic relations between them. Each wordnet represents a unique language-internal system of lexicalizations. In addition, the wordnets are linked to an Inter-Lingual-Index, based on the Princeton wordnet. Via this index, the languages are interconnected so that it is possible to go from the words in one language to similar words in any other language. The index also gives access to a shared top-ontology of 63 semantic distinctions. This top-ontology provides a common semantic framework for all the languages, while language specific properties are maintained in the individual wordnets. The database can be used, among others, for monolingual and cross-lingual information retrieval, which was demonstrated by the users in the project.

PUBLIC TAGS related to tag taln

no tag

Sponsorised links