Data Publishing

Tuesday, 31 October, 2006

The GRADE project, which I am a collaborative partner on, is concerned with scoping geospatial repositories. The project has principally been tackling legal and technical issues regarding their establishment and, I think, has made some very good progress. Yet behind all this work you do actually need people to deposit data for inclusion in a repository. And this is where the rub is. At the moment we have data centres (buzz word 5-10 years ago) and we are now seeing the increased establishment of institutional repositories. Yet what/where is the impetus for actually depositing data?? I suspect that this is partly subject specific. My impression is that subjects such as physics have a greater tendency to data share. In geosciences its usually a case of keeping what you have collected and only ever publishing the results; not the data itself. To be fair this is beginning to change with the research councils in the UK requiring the deposition of data from funded work. But how much data (from research) actually results from research council funding? My impression is less than half (although if anyone has any figures that would be interesting).

So we have the situation where there is a “top down” establishment of respostories, but no one is actually interested in using them. We have researchers collecting data (for research), but it is research publications that drives the agenda (NOT the data). I know that I see absolutely no reason why I should share primary data and, indeed, I like to discuss with people potential uses before sharing. Then of course we have the vested interests of the institutions that employ researchers. They are directly or indirectly funding much of this research and there is increased interest in “monitoring potential assets” (although quite to what extent institutions have a claim to IPR is another matter).

So where does that actually leave things?? Well Mahendra Mahey (at the GRADE meeting this week) provided a summary of repository work in the UK and (briefly) summarised some points that Pete Burnhill (Director of EDINA) was making along these lines. And that is that data should be published. As a community, academics need to be encouraged about the positive aspects of data sharing and see this as an opportunity to publish. Indeed one could argue that data publication should be seen as a valid publication route. And in the same way that journal articles are peer reviewed, so data should also be. This is a route that we have been toying with at the Journal of Maps. Several articles have data published with them (e.g. Stokes et al. They have been checked for appropriateness but not explicity reviewed in the same manner the article was. I am currently reviewing how useful this “service” is, with the potential to asks reviewers to comment on submitted data, as well as having a separate data reviewer. This actually raises a whole host of other questions concerning data preservation (as opposed to a repository) which I won’t comment on at this moment.

With the above comments, I think it is clear that I’m in favour of data publication, but I am inclined to think at the moment that the data should follow the research (hence the reason for publishing the data with the article at the Journal of Maps). The problem with separating data and content is that maintaining the explicit link between the two becomes more complex (just look at journals from the 19th century to see how effective immediacy is). It also makes the peer review process much simpler. That isn’t to say that data can’t be stored in a repository, but that, in the first instance, it might be better placed with the article. Indeed, I could see the research councils requirement for copies of publications and data deposition taken a stage further and requiring research articles to have data published with them. Clearly the emphasis is then shifted to the journals many of whom will not be placed to deal with it. However the whole research publication ethos is changing (e.g. open access) and it is time that journals become proactive. Indeed, with Wiley and Elsevier being so prominent (and supporting things like permanent electronic archives), it would only require these two organisations to support such an initiative for it to really take off. Whilst in principle it sounds a reasonable idea, there are many barriers. Not least the sheer volume of some data sets within a web based infrastructure where most journals struggle to offer more than a static PDF.

Add comment

Fill out the form below to add your own comments