Sponsorised links
This year
INIST au 19ème Festival International de Géographie - Institut de l’Information Scientifique et Technique
Développée par le service Veille de l’INIST/CNRS, l’application NIPPOGEO permet une consultation dynamique de corpus hétérogènes :
- 764 notices bibliographiques issues de la Bibliographie Géographique Internationale - BGI, le domaine Géographie de la base FRANCIS de l’INIST / CNRS,
- et 185 images (plaques de verres, diapositives et photographies numériques) fournies par les chercheurs de PRODIG.
Accessible librement sur Internet, NIPPOGEO offre non seulement un accès à des données bibliographiques et bibliométriques aux spécialistes du domaine, mais participe également à la diffusion de l’information scientifique auprès du grand public.
DLFP: JeuxDeMots : un jeu en ligne pour produire des données lexicales libres
Outre son aspect ludique, l'intérêt de JeuxDeMots réside dans le fait qu'il produit un réseau lexical en fonction des réponses données par les joueurs
Benoît Sagot - WOLF
Le WOLF (Wordnet Libre du Français) est une ressource lexicale sémantique (wordnet) libre pour le français.
All Bills Paid Apartment In Corpus Christi
Search Corpus Christi Apartments Rentals by City or Property Name. Find New Apartment Listings in Corpus Christi and get Alerts on the Best Properties in your Area.
Sponsorised links
2007
Donate your speech to VoxForge using your telephone
VoxForge ( http://www.voxforge.org ) is a open source project that collects speech recordings for use in the creation of Acoustic Models. Speech recognition engines need an acoustic model to recognize speech. To create an acoustic model, you take a very large number of speech audio recordings and 'compile' them into statistical representations of the sounds that make up each word. Most open source speech recognition engines use 'closed source' acoustic models. VoxForge hopes to address this problem by creating a free gpl speech corpus, and generating acoustic models from this corpus.
You can now use your telephone to your donate your speech. Click this link: http://www.voxforge.org/home/s… to get the number, and the Interactive Voice Response system will guide you through the process.
Europeana
Europeana est un prototype de bibliothèque en ligne développé par la Bibliothèque nationale de France, dans le cadre du projet de Bibliothèque numérique européenne.
Europeana rassemble environ 12 000 documents libres de droits issus des collections de la BnF, de la Bibliothèque Nationale Széchényi de Hongrie et de la Bibliothèque nationale du Portugal.
Home - voxforge.org
VoxForge was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines.
We will categorize and make available all submitted audio files under the GPL license, and then 'compile' them into Acoustic Models for use with Open Source Speech Recognition engines such as Sphinx, ISIP, HTK, and Julius.
Improving Open Source Speech Recognition
Speech Recognition Engines require two types of files to recognize speech: an Acoustic Model, created by 'compiling' a lots of transcribed speech into statistical models, and a Language Model (for Dictation) or Grammar file (for Command and Control).
Most Acoustic Models used by 'Open Source' Speech Recognition engines are 'Closed Source'. They do not give you access to the speech audio (the 'Source') used to create the Acoustic Model. The reason for this is that there is no free Speech Corpus in a form that can readily be used to create Acoustic Models for Speech Recognition Engines. Open Source projects are thus required to purchase a Speech Corpus which has restrictive licensing in order to create their Acoustic Models.
VoxForge (http://www.voxforge.org) was set up to address this problem. The site collects GPL transcribed speech audio from users which is then used to create Acoustic Models. These can then be used with Free and Open Source Speech Recognition Engines such as Sphinx, ISIP, Julius and/or HTK.
dbpedia.org - Using Wikipedia as a Web Database
dbpedia.org is a community effort to extract structured information from Wikipedia
and to make this information available on the Web. dbpedia allows you to ask
sophisticated queries against Wikipedia and to link other datasets on the Web
to Wikipedia data.
2006
Official Google Research Blog: All Our N-gram are Belong to You
Here at Google Research we have been using word n-gram models for a variety of R&D projects, such as statistical machine translation, speech recognition, spelling correction, entity detection, information extraction, and others. While such models have usually been estimated from training corpora containing at most a few billion words, we have been harnessing the vast power of Google's datacenters and distributed processing infrastructure to process larger and larger training corpora. We found that there's no data like more data, and scaled up the size of our data by one order of magnitude, and then another, and then one more - resulting in a training corpus of one trillion words from public Web pages.
2005
Textes en accès libre
Liens vers des sites qui offrent des textes gratuits!
start [WaCky]
The WaCky Project is a nascent effort (I always liked the expression nascent effort) by a group of linguists to build or gather tools to use the web as a linguistic corpus.
2004
Natural Language Toolkit
The Natural Language Toolkit is a suite of Python packages and data for natural language processing; it comes with extensive API documentation and tutorials. NLTK-Lite is the version under active development.
La Bibliothèque électronique du Québec
La Bibliothèque met en ligne des textes d'auteurs du monde entier, appartenant au domaine public.
1
(24 marks)
