CLLD - Cross-Linguistic Linked Data

Publication models

There are three major ways in which the CLLD project helps publishing cross-linguistic datasets:

Large databases may be published as standalone CLLD apps under the umbrella of the clld.org series; basically following the example of a book in an edited series.
Smaller datasets may be submitted to one of the database journals started by the CLLD project.
Datasets may be hosted independently from the CLLD project, simply re-using the clld software.

Published datasets

The following datasets are maintained by the CLLD project, i.e. fall into categories 1 and 2 above:

Name	Description	Editors	CLDF dataset on ZENODO
WALS Online	The World Atlas of Language Structures	Matthew Dryer & Martin Haspelmath
WOLD	The World Loanword Database	Martin Haspelmath & Uri Tadmor
APiCS Online	The Atlas of Pidgin and Creole Language Structures	Susanne Maria Michaelis, Philippe Maurer, Martin Haspelmath, and Magnus Huber
ValPaL	Valency Patterns Leipzig	Iren Hartmann, Martin Haspelmath & Bradley Taylor
eWAVE	The Electronic World Atlas of Varieties of English	Bernd Kortmann & Kerstin Lunkenheimer
AfBo	A world-wide survey of affix borrowing	Frank Seifart
IDS	The Intercontinental Dictionary Series	Bernard Comrie & Hans-Jörg Bibiko
ASJP	The database of the Automated Similarity Judgement Program	Søren Wichmann et al.
Numerals	Numerals in the World’s Languages	Eugene Chan
Glottolog	catalog of all languages, families and dialects, with comprehensive reference information	Harald Hammarström, Martin Haspelmath, Robert Forkel & Sebastian Bank
SAILS Online	The South American Indigenous Language Structures Online	Harald Hammarström
PHOIBLE Online	The world's largest database of phonological inventories	Steven Moran, Daniel McCloy and Richard Wright
Tsammalex	A multilingual lexical database on plants and animals	Christfried Naumann & Steven Moran & Guillaume Segerer & Robert Forkel
CSD	The Comparative Siouan Dictionary	Rankin, Robert L. & Carter, Richard T. & Jones, A. Wesley & Koontz, John E. & Rood, David S. & Hartmann, Iren
Concepticon	The Concepticon	List, Johann Mattis & Rzymski, Christoph & Greenhill, Simon & Schweikhard, Nathanael & Pianykh, Kristina & Tjuka, Annika & Wu, Mei-Shin & Forkel, Robert
Dogonlanguages	Dogon and Bangime Linguistics	Moran, Steven & Forkel, Robert & Heath, Jeffrey
Dictionaria	Open-access journal publishing dictionaries from all over the world	Chief editors: Haspelmath, Martin & Stiebels, Barbara; Managing editor: Hartmann, Iren
LDH	The Language Description Heritage library	Managing editor: Robert Forkel	Community on Zenodo
TuLeD	Tupían Lexical Database	Fabrício Ferraz Gerardi and Stanislav Reichert

Among datasets in category 3 above, the following have come to our attention:

Name	Description	Editors
NorthEuraLex	Lexicostatistical Database of Northern Eurasia	Johannes Dellert and Gerhard Jäger
MosLex	Moscow Lexical Database	Alexei Kassian
DoReCo	DoReCo (Language DOcumentation REference COrpus)	Frank Seifart et al.

Update policy

CLLD datasets follow the update model of the traditional publications: Errata or additions are collected until a new edition of the dataset is released. Typically we aim to have not more than one edition per year.

But since we still want to exploit the fact that online publications could be continuously updated, we distinguish two categories of data:

Core data: represents the contributions of the dataset to research, i.e. the citeable content, e.g. value assignments in typological databases. This type of data can only be updated with a new edition, since we want to make it easy to identify and cite exact versions of a dataset.
Supplemental data: may be added to a dataset to enhance navigation within the set, or to enable visualization. Examples for this kind of data are geo-coordinates for languages, bibliographical information for sources, etc. Data in this category may be updated anytime, although we will still keep track of when and what is changed.

Data reuse

CLLD data is meant to be easily re-usable and we would love to hear about cases where it has been reused - be it in research or teaching.

CLLD – Cross-Linguistic Linked Data

Publication models

Published datasets

Update policy

Data reuse