We are obviously not the first ones to have come up with the idea of using GitHub for collaborative data curation.
What GitHub and the pull request have done for open source software development is simply too good to lose out on when it comes to research data curation: They created a global community around a:
which lowers the level of expertise for potential contributors to what seems the optimal threshold to divide signal from noise.
Now, knowing how to use a version control system may not quite be the optimal threshold when trying to stipulate researchers to contribute to collaborative curation projects. But it should be safe to say it’s on its way into the curriculum.
So what is our take on using GitHub for collaborative data curation? First of all, we see it as a way to formalize procedures which have been part of data curation at all times:
Note: While some advantages of using git hinge on using line-based text formats for the data, preferably not cluttered with markup ( csv , BibTeX , JSON if pretty printed), most of the points hold for binary data as well. But again, missing out on the level of support provided by a mature system such as git would seem foolish.
But you can go further:
So considering all this and trying to eat our own dogfood, we are happy to make Glottolog (our flagship when it comes to integrating data from multiple sources) available for this type collaboration: clld/glottolog-data.