CLLD - Cross-Linguistic Linked Data

The Austronesian Comparative Dictionary

posted Friday, March 17, 2023 by Robert Forkel

Robert Blust’s The Austronesian Comparative Dictionary - web edition (ACD) has been online for a long time - and actually still is (again) at https://www.trussel2.com/ACD/ . However, after the unexpected passing of Steve Trussel, the site was not updated anymore and can now be regarded as a legacy resource.

clld 10.0.0 - 10 years of clld

posted Wednesday, January 18, 2023 by Robert Forkel

After 10 years of development it seems to be the right time for release 10.0.0 of the clld framework. And maybe it’s also time for some retrospection.

A new kind of clld app

posted Thursday, March 25, 2021 by Robert Forkel

We recently published TuLar - the Tupían Language Resources site - and the Hindu Kush Areal Typology site. Both are examples of a new kind of clld app - one that aggregates different kinds of linguistic data with an areal focus, rather than collecting data of the same kind (e.g. typological questionnaires) globally. While this kind of customization was always possible within the clld framework, it can now be done more efficiently and in a more principled way. In the following we describe how.

Retro-digitizing the "Tableaux phonétiques des patois suisses romands"

posted Thursday, September 24, 2020 by Robert Forkel

Exploiting CLDF and the enhanced CLDF support in clld 7 made collaboration on retro-digitizing Gauchat’s “Tableaux phonétiques des patois suisses romands” from 1925 easy. Read more in the preprint A digital, retro-standardized edition of the Tableaux phonétiques des patois Suisses romands (TPPSR) and below.

CLLD News

posted Thursday, May 07, 2020 by Robert Forkel

With clld 7.1 out of the door it’s time for an update on CLLD-related activities.

PHOIBLE 2.0 released

posted Thursday, February 14, 2019 by Robert Forkel

We are happy to announce the release of PHOIBLE 2.0. This release adds almost 1,000 new phoneme inventories for almost 500 new languages over PHOIBLE 1.0. It also includes information about allophones for many inventories.

What's up with CLLD?

posted Wednesday, March 07, 2018 by Robert Forkel

While this blog has been quiet for some time, this does not mean there is nothing to report. In fact, there has been a lot of activity, ranging from major updates of our servers and our core libraries (e.g. the clld package), to updates of several CLLD databases and work on new standards.

Glottolog 3.0 released

posted Wednesday, March 29, 2017 by Robert Forkel

We are happy to announce the release of Glottolog 3.0. This is the first release of Glottolog completely based on data curated in the public GitHub repository clld/glottolog.

Concepticon 1.0 released

posted Thursday, May 12, 2016 by Robert Forkel

We are happy to announce the release of Concepticon 1.0, a resource for the linking of concept lists. Our resource presents an attempt to link the large amount of different concept lists which are used in the linguistic literature, ranging from Swadesh lists in historical linguistics to naming tests in clinical studies and psycholinguistics.

Mapping Glottocodes to ISO 639-3

posted Friday, November 13, 2015 by Robert Forkel

A lot of data about languages marks the associated language either using its Glottocode (i.e. its identifier according to Glottolog) or its ISO 639-3 code. So often, when merging data from various sources, the issue of mapping between the two code systems comes up.

CLLD at FORGE 2015

posted Monday, September 21, 2015 by Robert Forkel

An abstract describing the CLLD project and how we use GitHub was accepted at FORGE 2015. So last friday I gave a talk at this event, which aimed at bringing together “digital humanities practitioners” and the emerging Humanities Data Centers.

Data curation strategies of CLLD databases

posted Thursday, July 09, 2015 by Robert Forkel

Two and a half years into the CLLD project we see two main strategies of coping with the versioning problem. In this post I will describe these strategies as exemplified by the WALS and Tsammalex database respectively and investigate their strengths and weaknesses.

Language Comparison with Linguistic Databases - LanCLiD 2

posted Monday, May 11, 2015 by Robert Forkel

On April 30, just before the big closing conference of the Department of Linguistics at the Max Planck Institute for Evolutionary Anthropology we hosted the second of what we hope to become a series of workshops on Language Comparison with Linguistic Databases.

Glottolog 2.4 released

posted Tuesday, April 07, 2015 by Robert Forkel

Two weeks ago we released version 2.4 of Glottolog - our comprehensive language catalog and bibliography. This update includes the addition of 76 languages resulting in moderate corrections of the classification , as well as two changes which are worth highlighting: 68131 new references and a new collaborative curation model.

The Open Source analogy for research data curation

posted Tuesday, February 03, 2015 by Robert Forkel

We are obviously not the first ones to have come up with the idea of using GitHub for collaborative data curation.

What open means - a case study

posted Wednesday, January 21, 2015 by Robert Forkel

Tsammalex, a multilingual lexical database on plants and animals, uses WWF Ecoregions as one facet to navigate species. Thus whenever new species are added to the database, we have to answer the question which ecoregions are populated by this species?

The Second Year

posted Wednesday, January 21, 2015 by Robert Forkel

As it turns out, our predictions for main work packages in 2014 have been rather inaccurate, to say the least.

Default Unicode Collation using pg_collkey

posted Friday, January 16, 2015 by Robert Forkel

A problem which recently popped up again, and which we eventually found a satisfying solution for is described as multilingual sorting in an article about the Oracle database system.

Language Comparison with Linguistic Databases - RefLex and Typological Databases

posted Friday, October 10, 2014 by Robert Forkel

I spent the last couple of days in Nijmegen having had the fortune of being invited to the workshop Language Comparison with Linguistic Databases: RefLex and Typological Databases at the Max Planck Institute for Psycholinguistics.

If you build it, they will come

posted Monday, September 15, 2014 by Robert Forkel

The motto from Field of Dreams

If you build it, they will come!

is often seen as the wrong attitude towards building web applications in general; its variant "If you publish it on the web, they will come" often works for research data, though. And what's even better they may provide added value on top of your data!

PHOIBLE Online

posted Monday, September 15, 2014 by Robert Forkel

On September 12, 2014, PHOIBLE Online, the world's largest database of phonological inventories, by Steven Moran, Daniel McCloy and Richard Wright, was published as CLLD database.

CLLD and OLAC

posted Monday, September 15, 2014 by Robert Forkel

Checking the success of the newest CLLD archive registration with OLAC (the Open Language Archives Community) I thought it was time for some bragging.

csv support in clld applications

posted Monday, July 28, 2014 by Robert Forkel

With the clld framework it has always been easy to provide custom representations of the resources in a database. As of version 0.15 this mechanism is used in the core framework to provide csv and csv metadata representations for any datatable in a clld app.

Citing CLLD Databases and Reproducible Research

posted Monday, July 28, 2014 by Robert Forkel

ZENODO's integration with GitHub provides a service that brings citeability and usability of CLLD databses for reproducible research to a new level.

Glottolog 2.3 is out and a minor update of WALS

posted Friday, July 04, 2014 by Robert Forkel

Last week has seen updates for two of our flagship datasets: Glottolog (our comprehensive language catalog and bibliography) and WALS Online (the World Atlas of Language Structures Online).

New domain for the World Loanword Database

posted Tuesday, June 03, 2014 by Robert Forkel

For a project setting out to publish Linked Data on the web this is close to the worst case scenario, but for reasons beyond our control we have to serve the World Loanword Database under a new domain. Starting June 3, 2014 WOLD should be accessed using the domain wold.clld.org instead of the old domain wold.livingsources.org.

SAILS Online published as CLLD database

posted Friday, April 04, 2014 by Robert Forkel

Yesterday we published the South American Indigenous Language Structures (SAILS) Online - a large database of grammatical properties of languages gathered from descriptive materials (such as reference grammars) by a team from the Languages in Contact Group (LinC) at Radboud University Nijmegen directed by Pieter Muysken.

CLLD at the 3rd Workshop on Linked Data in Linguistics

posted Friday, April 04, 2014 by Robert Forkel

A paper describing the CLLD project and the role Linked Data plays in our publishing strategy was accepted at the 3rd Workshop on Linked Data in Linguistics (LDL-2014) - co-located with LREC 2014 - to be held in Reykjavik, Iceland on May 27.

CLLD at DHd "Digital Humanities im deutschsprachigen Raum" 2014

posted Monday, March 24, 2014 by Robert Forkel

A poster presenting the CLLD project was accepted at DHd 2014 - the first conference of the association "Digital Humanities im deutschsprachigen Raum" - to be held in Passau from March 26 to March 28.

Exploring Glottolog with Python

posted Tuesday, February 18, 2014 by Robert Forkel

Sebastian Bank from the University of Leipzig shared an IPython notebook which can serve as a tutorial for Exploring Glottolog with Python.

I think this is a fantastic resource using CLLD data the way it was meant to be used, but also showcasing

how to efficiently leverage the tools of the Python eco-system for scientific computing,
how to cope with data expressed in RDF – a data model more expressive but also more complicated than the typical arrangement of data in tables,
how reproducible research could look like.

clld on PyPI

posted Monday, February 10, 2014 by Robert Forkel

With people in Zurich and Nijmegen having installed the clld framework successfully (and on different platforms, no less), it was time to make a “real” Python package, i.e. publish clld on the Python Package Index. So here it is: clld on PyPI. This means that you can now install clld simply running

pip install clld

The First Year

posted Friday, January 03, 2014 by Robert Forkel

After the first of four years of funding it is time to review what has been achieved in 2013 and to outline the next milestones of the project.

CLLD – Cross-Linguistic Linked Data

The Austronesian Comparative Dictionary

clld 10.0.0 - 10 years of clld

A new kind of clld app

Retro-digitizing the "Tableaux phonétiques des patois suisses romands"

CLLD News

PHOIBLE 2.0 released

What's up with CLLD?

Glottolog 3.0 released

Concepticon 1.0 released

Mapping Glottocodes to ISO 639-3

CLLD at FORGE 2015

Data curation strategies of CLLD databases

Language Comparison with Linguistic Databases - LanCLiD 2

Glottolog 2.4 released

The Open Source analogy for research data curation

What open means - a case study

The Second Year

Default Unicode Collation using pg_collkey

Language Comparison with Linguistic Databases - RefLex and Typological Databases

If you build it, they will come

PHOIBLE Online

CLLD and OLAC

csv support in clld applications

Citing CLLD Databases and Reproducible Research

Glottolog 2.3 is out and a minor update of WALS

New domain for the World Loanword Database

SAILS Online published as CLLD database

CLLD at the 3rd Workshop on Linked Data in Linguistics

CLLD at DHd "Digital Humanities im deutschsprachigen Raum" 2014

Exploring Glottolog with Python

clld on PyPI

The First Year