CLLD – Cross-Linguistic Linked Data

Publication models

There are three major ways in which the CLLD project helps publishing cross-linguistic datasets:

  1. Large databases may be published as standalone CLLD apps under the umbrella of the clld.org series; basically following the example of a book in an edited series.
  2. Smaller datasets may be submitted to one of the database journals started by the CLLD project.
  3. Datasets may be hosted independently from the CLLD project, simply re-using the clld software.

Published datasets

The following datasets are maintained by the CLLD project, i.e. fall into categories 1 and 2 above:

NameDescriptionEditorsCLDF dataset on ZENODO
WALS Online The World Atlas of Language Structures Matthew Dryer & Martin Haspelmath DOI
WOLD The World Loanword Database Martin Haspelmath & Uri Tadmor DOI
APiCS Online The Atlas of Pidgin and Creole Language Structures Susanne Maria Michaelis, Philippe Maurer, Martin Haspelmath, and Magnus Huber DOI
ValPaL Valency Patterns Leipzig Iren Hartmann, Martin Haspelmath & Bradley Taylor
eWAVE The Electronic World Atlas of Varieties of English Bernd Kortmann & Kerstin Lunkenheimer DOI
AfBo A world-wide survey of affix borrowing Frank Seifart DOI
IDS The Intercontinental Dictionary Series Bernard Comrie & Hans-Jörg Bibiko DOI
ASJP The database of the Automated Similarity Judgement Program Søren Wichmann et al. DOI
Numerals Numerals in the World’s Languages Eugene Chan
Glottolog catalog of all languages, families and dialects, with comprehensive reference information Harald Hammarström, Martin Haspelmath, Robert Forkel & Sebastian Bank DOI
SAILS Online The South American Indigenous Language Structures Online Harald Hammarström DOI
PHOIBLE Online The world's largest database of phonological inventories Steven Moran, Daniel McCloy and Richard Wright DOI
Tsammalex A multilingual lexical database on plants and animals Christfried Naumann & Steven Moran & Guillaume Segerer & Robert Forkel
CSD The Comparative Siouan Dictionary Rankin, Robert L. & Carter, Richard T. & Jones, A. Wesley & Koontz, John E. & Rood, David S. & Hartmann, Iren
Concepticon The Concepticon List, Johann Mattis & Rzymski, Christoph & Greenhill, Simon & Schweikhard, Nathanael & Pianykh, Kristina & Tjuka, Annika & Wu, Mei-Shin & Forkel, Robert
Dogonlanguages Dogon and Bangime Linguistics Moran, Steven & Forkel, Robert & Heath, Jeffrey
Dictionaria Open-access journal publishing dictionaries from all over the world Chief editors: Haspelmath, Martin & Stiebels, Barbara; Managing editor: Hartmann, Iren
LDH The Language Description Heritage library Managing editor: Robert Forkel Community on Zenodo
TuLeD Tupían Lexical Database Fabrício Ferraz Gerardi and Stanislav Reichert

Among datasets in category 3 above, the following have come to our attention:

NameDescriptionEditors
NorthEuraLex Lexicostatistical Database of Northern Eurasia Johannes Dellert and Gerhard Jäger
MosLex Moscow Lexical Database Alexei Kassian
DoReCo DoReCo (Language DOcumentation REference COrpus) Frank Seifart et al.

Update policy

CLLD datasets follow the update model of the traditional publications: Errata or additions are collected until a new edition of the dataset is released. Typically we aim to have not more than one edition per year.

But since we still want to exploit the fact that online publications could be continuously updated, we distinguish two categories of data:

Core data
represents the contributions of the dataset to research, i.e. the citeable content, e.g. value assignments in typological databases. This type of data can only be updated with a new edition, since we want to make it easy to identify and cite exact versions of a dataset.
Supplemental data
may be added to a dataset to enhance navigation within the set, or to enable visualization. Examples for this kind of data are geo-coordinates for languages, bibliographical information for sources, etc. Data in this category may be updated anytime, although we will still keep track of when and what is changed.

Data reuse

CLLD data is meant to be easily re-usable and we would love to hear about cases where it has been reused - be it in research or teaching.