CLLD – Cross-Linguistic Linked Data

Publication models

There are three major ways in which the CLLD project helps publishing cross-linguistic datasets:

  1. Large databases may be published as standalone CLLD apps under the umbrella of the series; basically following the example of a book in an edited series.
  2. Smaller datasets may be submitted to one of the database journals started by the CLLD project.
  3. Datasets may be hosted independently from the CLLD project, simply re-using the clld software.

Published datasets

The following datasets are maintained by the CLLD project, i.e. fall into categories 1 and 2 above:

NameDescriptionEditorsDOI for archived version at ZENODO
WALS Online The World Atlas of Language Structures Matthew Dryer & Martin Haspelmath 10.5281/zenodo.11040
WOLD The World Loanword Database Martin Haspelmath & Uri Tadmor 10.5281/zenodo.11137
APiCS Online The Atlas of Pidgin and Creole Language Structures Susanne Maria Michaelis, Philippe Maurer, Martin Haspelmath, and Magnus Huber 10.5281/zenodo.11135
ValPaL Valency Patterns Leipzig Iren Hartmann, Martin Haspelmath & Bradley Taylor
eWAVE The Electronic World Atlas of Varieties of English Bernd Kortmann & Kerstin Lunkenheimer
AfBo A world-wide survey of affix borrowing Frank Seifart
IDS The Intercontinental Dictionary Series Bernard Comrie & Hans-Jörg Bibiko
ASJP The database of the Automated Similarity Judgement Program Søren Wichmann et al.
Numerals Numerals in the World’s Languages Eugene Chan
Glottolog catalog of all languages, families and dialects, with comprehensive reference information Harald Hammarström, Martin Haspelmath & Robert Forkel
SAILS Online The South American Indigenous Language Structures Online Harald Hammarström
PHOIBLE Online The world's largest database of phonological inventories Steven Moran, Daniel McCloy and Richard Wright 10.5281/zenodo.11706
Tsammalex A multilingual lexical database on plants and animals Christfried Naumann & Steven Moran & Guillaume Segerer & Robert Forkel 10.5281/zenodo.17571
CSD The Comparative Siouan Dictionary Rankin, Robert L. & Carter, Richard T. & Jones, A. Wesley & Koontz, John E. & Rood, David S. & Hartmann, Iren 10.5281/zenodo.19782

Update policy

CLLD datasets follow the update model of the traditional publications: Errata or additions are collected until a new edition of the dataset is released. Typically we aim to have not more than one edition per year.

But since we still want to exploit the fact that online publications could be continuously updated, we distinguish two categories of data:

Core data
represents the contributions of the dataset to research, i.e. the citeable content, e.g. value assignments in typological databases. This type of data can only be updated with a new edition, since we want to make it easy to identify and cite exact versions of a dataset.
Supplemental data
may be added to a dataset to enhance navigation within the set, or to enable visualization. Examples for this kind of data are geo-coordinates for languages, bibliographical information for sources, etc. Data in this category may be updated anytime, although we will still keep track of when and what is changed.

Data reuse

CLLD data is meant to be easily re-usable and we would love to hear about cases where it has been reused - be it in research or teaching.