Robert Blust’s The Austronesian Comparative Dictionary - web edition (ACD) has been online for a long time - and actually still is (again) at https://www.trussel2.com/ACD/ . However, after the unexpected passing of Steve Trussel, the site was not updated anymore and can now be regarded as a legacy resource.
After 10 years of development it seems to be the right time
for release 10.0.0 of the clld
framework.
And maybe it’s also time for some retrospection.
We recently published TuLar - the Tupían Language Resources site - and the Hindu Kush Areal Typology site. Both are examples of a new kind of clld app - one that aggregates different kinds of linguistic data with an areal focus, rather than collecting data of the same kind (e.g. typological questionnaires) globally. While this kind of customization was always possible within the clld framework, it can now be done more efficiently and in a more principled way. In the following we describe how.
Exploiting CLDF and the enhanced CLDF support in clld
7 made collaboration on
retro-digitizing Gauchat’s “Tableaux phonétiques des patois suisses romands” from 1925 easy. Read more in the preprint
A digital, retro-standardized edition of the Tableaux phonétiques des patois Suisses romands (TPPSR) and below.
With clld
7.1 out of the door
it’s time for an update on CLLD-related activities.
We are happy to announce the release of PHOIBLE 2.0. This release adds almost 1,000 new phoneme inventories for almost 500 new languages over PHOIBLE 1.0. It also includes information about allophones for many inventories.
While this blog has been quiet for some time, this does not mean there is nothing
to report. In fact, there has been a lot of activity, ranging from major updates
of our servers and our core libraries (e.g. the clld
package), to updates of several CLLD databases and work on new standards.
We are happy to announce the release of Glottolog 3.0. This is the first release of Glottolog completely based on data curated in the public GitHub repository clld/glottolog.
We are happy to announce the release of Concepticon 1.0, a resource for the linking of concept lists. Our resource presents an attempt to link the large amount of different concept lists which are used in the linguistic literature, ranging from Swadesh lists in historical linguistics to naming tests in clinical studies and psycholinguistics.
A lot of data about languages marks the associated language either using its Glottocode (i.e. its identifier according to Glottolog) or its ISO 639-3 code. So often, when merging data from various sources, the issue of mapping between the two code systems comes up.
An abstract describing the CLLD project and how we use GitHub was accepted at FORGE 2015. So last friday I gave a talk at this event, which aimed at bringing together “digital humanities practitioners” and the emerging Humanities Data Centers.
Two and a half years into the CLLD project we see two main strategies of coping with the versioning problem. In this post I will describe these strategies as exemplified by the WALS and Tsammalex database respectively and investigate their strengths and weaknesses.
On April 30, just before the big closing conference of the Department of Linguistics at the Max Planck Institute for Evolutionary Anthropology we hosted the second of what we hope to become a series of workshops on Language Comparison with Linguistic Databases.
Two weeks ago we released version 2.4 of Glottolog - our comprehensive language catalog and bibliography. This update includes the addition of 76 languages resulting in moderate corrections of the classification , as well as two changes which are worth highlighting: 68131 new references and a new collaborative curation model.
We are obviously not the first ones to have come up with the idea of using GitHub for collaborative data curation.
Tsammalex, a multilingual lexical database on plants and animals, uses WWF Ecoregions as one facet to navigate species. Thus whenever new species are added to the database, we have to answer the question which ecoregions are populated by this species?
As it turns out, our predictions for main work packages in 2014 have been rather inaccurate, to say the least.
A problem which recently popped up again, and which we eventually found a satisfying solution for is described as multilingual sorting in an article about the Oracle database system.
I spent the last couple of days in Nijmegen having had the fortune of being invited to the workshop Language Comparison with Linguistic Databases: RefLex and Typological Databases at the Max Planck Institute for Psycholinguistics.
The motto from Field of Dreams
If you build it, they will come!
is often seen as the wrong attitude towards building web applications in general; its variant "If you publish it on the web, they will come" often works for research data, though. And what's even better they may provide added value on top of your data!
On September 12, 2014, PHOIBLE Online, the world's largest database of phonological inventories, by Steven Moran, Daniel McCloy and Richard Wright, was published as CLLD database.
Checking the success of the newest CLLD archive registration with OLAC (the Open Language Archives Community) I thought it was time for some bragging.
With the clld framework it has always been easy to provide custom representations of the resources in a database. As of version 0.15 this mechanism is used in the core framework to provide csv and csv metadata representations for any datatable in a clld app.
ZENODO's integration with GitHub provides a service that brings citeability and usability of CLLD databses for reproducible research to a new level.
Last week has seen updates for two of our flagship datasets: Glottolog (our comprehensive language catalog and bibliography) and WALS Online (the World Atlas of Language Structures Online).
For a project setting out to publish Linked Data on the web this is close to the worst case scenario, but for reasons beyond our control we have to serve the World Loanword Database under a new domain. Starting June 3, 2014 WOLD should be accessed using the domain wold.clld.org instead of the old domain wold.livingsources.org.
Yesterday we published the South American Indigenous Language Structures (SAILS) Online - a large database of grammatical properties of languages gathered from descriptive materials (such as reference grammars) by a team from the Languages in Contact Group (LinC) at Radboud University Nijmegen directed by Pieter Muysken.
A paper describing the CLLD project and the role Linked Data plays in our publishing strategy was accepted at the 3rd Workshop on Linked Data in Linguistics (LDL-2014) - co-located with LREC 2014 - to be held in Reykjavik, Iceland on May 27.
A poster presenting the CLLD project was accepted at DHd 2014 - the first conference of the association "Digital Humanities im deutschsprachigen Raum" - to be held in Passau from March 26 to March 28.
Sebastian Bank from the University of Leipzig shared an IPython notebook which can serve as a tutorial for Exploring Glottolog with Python.
I think this is a fantastic resource using CLLD data the way it was meant to be used, but also showcasing
With people in Zurich and Nijmegen having installed the clld framework successfully (and on different platforms, no less), it was time to make a “real” Python package, i.e. publish clld on the Python Package Index. So here it is: clld on PyPI. This means that you can now install clld simply running
pip install clld
After the first of four years of funding it is time to review what has been achieved in 2013 and to outline the next milestones of the project.