Sample Corpora
Go to the directory of a corpus, download the .txm binary file, and from within TXM call the 'File > Load' command to load it.
For some corpora the sources are also provided, so you can also import from the sources and tune the corpus.
Written texts
French
- discours: corpus of various French presidents’ speeches, published by Damon Mayaffre.
- fleurs-du-mal: Les Fleurs du mal (The Flowers of Evil) by Charles Baudelaire, edition of Jean-Marie Viprey.
- mpt: corpus of French National Assembly debates on the "Mariage pour tous" law of 2013 from the mariagepourtousInXML project.
- quete-du-graal-tei: Queste del Saint Graal (Quest for the Holy Grail), edition of Christiane-Marchello Nizia and Alexei Lavrentiev, based on 'Lyon, Palais des Arts 77 (ms. K) (fol. 160a-224d)' and 'Paris, BNF n. acq. fr. 1119 (ms. Z)' ca. 1225 or 1230 Old French manuscripts.
- tdm80j: Le tour du monde en quatre-vingts jours (Around the World in Eighty Days), Jules Verne, 1873, edition of J. Hetzel et Cie. Synoptic edition with Wikisource facsimile images.
- txm-odt-manual: TXM User's manual as a TXM corpus.
- voeux: See voeux-fr.
- voeux-fr: corpus of 1959-2009 New Year’s Day 51 speeches of French presidents, published by Jean-Marc Leblanc.
English
- brown: corpus of 500 texts written in American English in 1961, published by W. N. Francis et H. Kucera (this version based on the XML TEI version of NLTK project).
- leviathan: Leviathan by Thomas Hobbes, 1588-1679. XML-TEI P5 text sample from the EEBO-TCP Phase 1 project.
German
- voeux-rfa: corpus of the Christmas and the New Year's addresses delivered by the Presidents and the Chancellors of the Federal Republic of Germany since 1987, contributed by Sascha Diwersy, Universität zu Köln.
Record transcriptions (synchronized)
Parallel corpora (multilingual)
- uno-tmx-sample: sample of United Nations General Assembly Resolutions: A Six-Language Parallel Corpus (Arabic, Chinese, English, French, Russian and Spanish), http://www.uncorpora.org [Alexandre Rafalovitch, Robert Dale. 2009. United Nations General Assembly Resolutions: A Six-Language Parallel Corpus. In Proceedings of the MT Summit XII, pages 292-299, Ottawa, Canada, August]. To import with the XML-TMX import module.
Annotated corpora
Some corpora are also available from the TXM demo portal: http://portal.textometrie.org/demo/?locale=en.