While this blog has been quiet for some time, this does not mean there is nothing
to report. In fact, there has been a lot of activity, ranging from major updates
of our servers and our core libraries (e.g. the
clld package), to updates of several CLLD databases and work on new standards.
We are happy to announce the release of Concepticon 1.0, a resource for the linking of concept lists. Our resource presents an attempt to link the large amount of different concept lists which are used in the linguistic literature, ranging from Swadesh lists in historical linguistics to naming tests in clinical studies and psycholinguistics.
A lot of data about languages marks the associated language either using its Glottocode (i.e. its identifier according to Glottolog) or its ISO 639-3 code. So often, when merging data from various sources, the issue of mapping between the two code systems comes up.
An abstract describing the CLLD project and how we use GitHub was accepted at FORGE 2015. So last friday I gave a talk at this event, which aimed at bringing together “digital humanities practitioners” and the emerging Humanities Data Centers.
Two and a half years into the CLLD project we see two main strategies of coping with the versioning problem. In this post I will describe these strategies as exemplified by the WALS and Tsammalex database respectively and investigate their strengths and weaknesses.
On April 30, just before the big closing conference of the Department of Linguistics at the Max Planck Institute for Evolutionary Anthropology we hosted the second of what we hope to become a series of workshops on Language Comparison with Linguistic Databases.
Two weeks ago we released version 2.4 of Glottolog - our comprehensive language catalog and bibliography. This update includes the addition of 76 languages resulting in moderate corrections of the classification , as well as two changes which are worth highlighting: 68131 new references and a new collaborative curation model.
Tsammalex, a multilingual lexical database on plants and animals, uses WWF Ecoregions as one facet to navigate species. Thus whenever new species are added to the database, we have to answer the question which ecoregions are populated by this species?
As it turns out, our predictions for main work packages in 2014 have been rather inaccurate, to say the least.
A problem which recently popped up again, and which we eventually found a satisfying solution for is described as multilingual sorting in an article about the Oracle database system.
I spent the last couple of days in Nijmegen having had the fortune of being invited to the workshop Language Comparison with Linguistic Databases: RefLex and Typological Databases at the Max Planck Institute for Psycholinguistics.
The motto from Field of Dreams
If you build it, they will come!
is often seen as the wrong attitude towards building web applications in general; its variant "If you publish it on the web, they will come" often works for research data, though. And what's even better they may provide added value on top of your data!
On September 12, 2014, PHOIBLE Online, the world's largest database of phonological inventories, by Steven Moran, Daniel McCloy and Richard Wright, was published as CLLD database.
With the clld framework it has always been easy to provide custom representations of the resources in a database. As of version 0.15 this mechanism is used in the core framework to provide csv and csv metadata representations for any datatable in a clld app.
ZENODO's integration with GitHub provides a service that brings citeability and usability of CLLD databses for reproducible research to a new level.
For a project setting out to publish Linked Data on the web this is close to the worst case scenario, but for reasons beyond our control we have to serve the World Loanword Database under a new domain. Starting June 3, 2014 WOLD should be accessed using the domain wold.clld.org instead of the old domain wold.livingsources.org.
Yesterday we published the South American Indigenous Language Structures (SAILS) Online - a large database of grammatical properties of languages gathered from descriptive materials (such as reference grammars) by a team from the Languages in Contact Group (LinC) at Radboud University Nijmegen directed by Pieter Muysken.
A paper describing the CLLD project and the role Linked Data plays in our publishing strategy was accepted at the 3rd Workshop on Linked Data in Linguistics (LDL-2014) - co-located with LREC 2014 - to be held in Reykjavik, Iceland on May 27.
A poster presenting the CLLD project was accepted at DHd 2014 - the first conference of the association "Digital Humanities im deutschsprachigen Raum" - to be held in Passau from March 26 to March 28.
I think this is a fantastic resource using CLLD data the way it was meant to be used, but also showcasing
With people in Zurich and Nijmegen having installed the clld framework successfully (and on different platforms, no less), it was time to make a “real” Python package, i.e. publish clld on the Python Package Index. So here it is: clld on PyPI. This means that you can now install clld simply running
pip install clld
After the first of four years of funding it is time to review what has been achieved in 2013 and to outline the next milestones of the project.