1963 Time Magazine corpus
2000 NIST Speaker Recognition Evaluation Corpus
A Syntactically Annotated Corpus of German Newspaper Texts
A Web Corpus and Topic Signatures for All WordNet 1.6 Nominal Senses (v 1.0)
AOT
Alpino Treebank
An Empirical Grammar of the English Verb System
Annotated list of resources on statistical NLP and corpus-based CL
Arabic Newswire Part 1
Arabic first names (female)
Arabic first names (male)
BNC Online Service
BRITISH NATIONAL CORPUS - WORLD EDITION
Base Textuelle de Moyen Francais
Bokr Russian Reference Corpus
Browse the Reuters-21578 collection
CETEMPUBLICO
CIRCLE Tutorial Archive
COMPUTER RESEARCH LABORATORY ANONYMOUS FTP
CORPUS DEL ESPANOL
CREA
CREA
Collections of texts and corpora
Corpora at ELSNET
Corpus Resources (Chulalongkorn University, Thailand)
Corpus de referencia de la lengua Espanola contemporanea: corpus oral peninsular
Corpus de referencia de la lengua Espanola contemporanea: corpus oral peninsular
Corpus del Espanol
Corpus del Espanol
Corpus of spoken Bulgarian
Cranfield collection
Czech National Corpus
Danish news corpus
ELRA Corpus Catalogue
Edinburgh Associative Thesaurus (EAT)
English [DIR: 46 entries] ...
EuroWordNet
Experimental Corpus Query System (University of Stuttgart, Germany)
Finnish text bank
French [DIR: 0 entries] ...
French [DIR: 0 entries] ...
GENIA corpus version 3.0p
German [DIR: 6 entries] ...
German Corpora, Online Search
HAITIAN CREOLE ELECTRONIC TEXTS
HCRC Map Task Corpus XML annotations
HPSG-based Syntactic Treebank of Bulgarian
Hansards Corpus - Searchable
Hebrew Corpora
Helsinki Corpus of Swahili (HCS)
ICOPOST
IMS Corpus Toolbox, Univ. of Stuttgart
IMS Corpus Workbench (CWB)
IPI PAN Polish Corpus
Information Retrieval Laboratory (University of Harbin) Chinese Corpus Resources
International Corpus of Learner English
Kiel University's Institute on Phonetics and Speech Procesing
LANGUAGE LEARNING CENTER - ACADEMIC CORPUS
Laboratorio de Engenharia da Linguagem - Poruguese corpora
Laboratorio de Engenharia da Linguagem - Poruguese corpora
Lacio Web Corpora
Le corpus BAF (French and English)
Linguistic Data Consortium (LDC) FTP site
Links to French corpora
List of Language Lists (Version 1.C)
List of stop words
Lists of Corpora
MICASE Michigan Corpus of Academic Spoken English
Manuel Barbera: General Corpora and Corpus Linguistics Resources
Medlars collection
Michigan Corpus of Academic Spoken English
Miscellaneous Word Lists from Oxford University
Miscellaneous corpora-related URL [DIR: 0 entries] ...
Morphologically Analyzed and Disambiguated Turkish News Text
Multilingual [DIR: 39 entries] ...
Multilingual Text Tools and Corpora
Name lists from US census
Nexing Corpus
OPUS -- An Open Source Parallel Corpus
On-line books at CMU
Oxford Text Archive
Oxford Text Archive Corpus of Italian Newspapers
Parallel Corpora
Parallel Corpora (United Nations) from the LDC
Parallel Corpora from the World Health Organization
Parallel Texts of Hong Kong Laws
Penn-Helsinki Parsed Corpus of Middle English
Polish subcorpus of the International Corpus of Learner English
Project Gutenberg
Prototype Corpus of Contemporary Arabic (CCA)
Ramon Piero Center for Research
Reuters Corpus
Romanian NLP
Russian [DIR: 0 entries] ...
Russian Corpora
Russian Corpora
Russian Corpus Page
Russian Corpus Site
Russian Corpus Site
Russian Newspaper Corpus
Russian Newspaper Corpus
Russicon Resources
Sanskrit Library
ShATR - a multi-simultaneous-speaker corpus
Slovene-English Parallel Corpus
Spanish [DIR: 0 entries] ...
Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio
Speech in Noisy Environments 2 (SPINE2 CODED) Coded Audio
Stop List
Survey of Electronic Corpora (by Jane A. Edwards, file at CMU)
Survey of English Usage, University College, London
Swedish [DIR: 3 entries] ...
Switchboard Transcription Project
TELRI Research Archive of Computational Tools and Resources
TRAINS93 Dialog transcripts
Terminology for more than 15 languages
The British National Corpus
The British National Corpus Survey: An Edited Letter from Lou Burnard
The CORPORA DataCenter (Norway)
The Childes Corpus - Children's language
The International Corpus of English
The Moby Corpus
The Moby corpus
The Oslo Corpus of Bosnian Texts
The Probert Encyclopedia
The Reading Academic Text Corpus
The Sketch Engine
The Sofie Treebank - A Parallel Treebank of North European Languages
The bank of English
Top 10 words used on Usenet
Towards a Corpus of Corrected Student Translations
Treebank tokenization scheme
Voice of America (VOA) Czech Broadcast News Audio
Voice of America (VOA) Czech Broadcast News Transcript Corpus
Word frequency lists
a corpus of student-advisor advising sessions (by Michael Elhadad)
list of Japanese transitive - intransitive verb pairs
UP
![]()
Total number of entries in system: 4260
, Last updated: Mon Aug 14 14:03:55 EDT 2006