First objective of the JISC-supported Sonex initiative was to identify and analyse deposit opportunities (use cases) for ingest of research papers (and potentially other scholarly work) into repositories. Later on, the project scope widened to include identification and dissemination of various projects being developed at institutions in relation to the deposit usecases previously analyzed. Finally, Sonex was recently asked to extend its analysis of deposit opportunities to research data.

Monday, 25 April 2011

Could external cooperation improve collection of specific JISC MRD project-related information?

  In forthcoming days SONEX will be publishing some posts on the JISC MRD Programme International Workshop held last March 28-29th at Aston Business School Conference Centre, Birmingham. Certain aspects debated at this comprehensive meeting were very useful for establishing an approach for dealing with research data management from a SONEX viewpoint, as debated in a SONEX meeting at EDINA on Mar 30th whose outcome will also be shortly blogged.

See IUCr Brian McMahon's report for a general review on the JISC MRD workshop.

One of the most visible disciplinary approaches to data management presented at the JISC MRD event -which featured all kinds of institutional and subject-based initiatives in the area- was the one coming from meteorology, palaeoclimatology and climate-related sciences: there was a presentation of the PEG-BOARD Project (U of Bristol) at the Subject-Oriented Approaches session on Monday, followed by ACRID (U of East Anglia & STFC) and Metafor (BADC & STFC) Project presentations on Tuesday afternoon.

One of the most relevant features of these climate-related projects is interdisciplinarity. PEG-BOARD Project in particular aims to serve the archaeology research community by supplying them their paleoclimate data.

A few specific aspects about PEG-BOARD were discussed after the project presentation. Interesting thing about them is they were not mentioned along the talk, nor are they reported at the project site:

- Due to the project interdisciplinarity, there are two clearly different user groups for palaeoclimatology data produced: climatologists, who will understand the nature of involved datasets, as they're central to their discipline, and archaeologists, who don't and need not know much about the data format but need the information contained in it for their own purposes - thus functioning as regular non-technical users to the project instead of researchers. However, as they are indeed researchers, the feedback they may provide on the project outcome could be so much more valuable.

- What archaeologists care about in the end is the data plottings, and Data Centres will not provide such processing. So what PEG did was implement specific software capabilities that will address the needs of non-technical data users (i.e. archaeologists), as to allow them to search for the plots or false-colour graphics they need. This piece of middleware is a conceptual key feature of the project in terms of deliverables.

- Climate data is usually archived in binary format, so it's often not easy to process. UK Met Office provided lots of info, often incomplete or in old formats. The adaption process of raw data to the project needs was very interesting and worth disseminating.

- Climate models were written in FORTRAN. When re-written or translated into C++, the results would vary for the same data arrays due to specific treatment by the code. That poses a quite amazing challenge in terms of model interpretation.

- When asked on whether researchers provided enriched metadata for their data, the answer was there's usually an input in terms of past experiments, i.e. "this is the data outcome of such and such experiment when changing initial conditions in such a way". Such-and-such experiment would be described the same way until one was reached that wasn't described at all.

The fact that none of these project aspects is recorded or discussed at the project blog poses a question on whether an external approach to data management projects might collect and disseminate very interesting information that researchers may not consider relevant enough to discuss from project blogs. Such an external approach to running projects might be carried out by data librarians in order to
share these specific project details with the data management community.

For whatever it may be worth, Sonex would be keen to do this kind of job for the MRD community.

1 comment:

  1. Good idea, Pablo. You can often get so close to things that you don't realise what may not be obvious to others.

    The role you describe is crucial to make sure the important lessons and findings coming out of these projects are shared.

    Might be something the DCC could pitch in to as well.

    Sarah Jones