CLLD – Cross-Linguistic Linked Data

Helping collect the world's language diversity heritage.

The Cross-Linguistic Linked Data project is developing and curating interoperable data publication structures using Linked Data principles as integration mechanism for distributed resources.

This philosophy allows for

  • small-scale efforts to publish individual databases like WALS (World Atlas of Language Structures) or WOLD (World Loanword Database), thereby preserving the brands established by these projects,
  • while at the same time facilitating a unified user experience across publications.

Within the project, this approach is applied to publishing lexical and grammatical databases already compiled at the MPI-EVA and elsewhere. This has led to a software framework which can be used to develop database journals, i.e. edited collections of databases submitted by linguists from around the world.

A list of databases implemented as clld applications and published on the CLLD platform is available following the Datasets link.

Dictionaria — a journal of dictionaries of less widely studied languages, edited by Martin Haspelmath & Barabara Stiebels — which runs on the clld framework has already published 10 dictionaries.

For the purposes of linking linguistic data uniquely to languages, language codes are needed for each language and each variety. For this reason, the CLLD project also comprises:

  • Glottolog (catalog of all languages, families and dialects, with comprehensive reference information), edited by Harald Hammarström, Martin Haspelmath & Robert Forkel


Arguably the most important outcome of the CLLD project was the specification of the CLDF standard. CLDF provides a standard and guidelines to store linguistic datasets as interrelated plain text files, facilitating

  • longterm archiving and FAIR access to such datasets via repositories like Zenodo,
  • a standardized submission format for journals such as Dictionaria,
  • simplified creation of clld applications from CLDF module-specific blueprints.

Using CLDF datasets as "input" for clld applications also solves one of the bigger problems of publishing data in a web application: How to handle multiple versions of the data? With CLDF, datasets can be versioned and multiple version can be published in a repository while the web application is relegated to a browsable interface of the latest published version.