First objective of the JISC-supported Sonex initiative was to identify and analyse deposit opportunities (use cases) for ingest of research papers (and potentially other scholarly work) into repositories. Later on, the project scope widened to include identification and dissemination of various projects being developed at institutions in relation to the deposit usecases previously analyzed. Finally, Sonex was recently asked to extend its analysis of deposit opportunities to research data.






Showing posts with label Research data. Show all posts
Showing posts with label Research data. Show all posts

Friday, 13 April 2012

Democratizing research data management


"A bit of competition would certainly do no harm to institutionally-driven RDM projects"
(JISC MRD Project representative along a conversation on LabArchives)

  A press release was published last week by BioMed Central announcing its partnership with LabArchives in order to provide access to an enhanced version of this Electronic Lab Notebook to all BMC journal authors. This enhanced version of LabArchives with a default 100 MB storage will allow researchers to assign DOIs to every dataset submitted as supplementary material to any BMC title. Labarchives is howewer not specifically aimed for supplementary data management: on the one hand the platform has a publisher-oriented side for supplementary dataset submission; on the other hand however, LabArchives could also be used as a standard tool for general-purpose research data management. This feature offers researchers the opportunity to use a RDM tool regardless of their institutional affiliation, scientific discipline or country they are working in.


Since the press release was published on Apr 4th, ie just before the Easter holidays arrived, there hasn't been much of a discussion (yet) on its potential implications to research data management. Howewer, this commercial software may provide an additional means to do RDM to all research groups in the UK currently not covered by a JISC MRD project or a specific institutional data policy. Besides this, in those countries where no particular emphasis is being made on the need for RDM initiatives, this tool might mean a very useful way to promote RDM directly among researchers removing the need for institutional data policies, funder mandates on data deposit and even support from data librarians. If we consider the double bottleneck currently preventing RDM activities to succeed in many countries -a top-down one created by the lack of official committement to RDM and a bottom-up one at understaffed institutional repository management teams- this BMC-LabArchives partnership could mean something close to a revolution in research data management if properly disseminated to authors, research groups and institutions.

There are of course other RDM platforms around as of today, such as figshare, Dryad or the growing data repository network, but LabArchives offering researchers the opportunity to publish the data they decide to share (including DOI assignment), a new way has in fact been opened for performing RDM at big and small HEIs. 100 MB -or even the 100 GB storage offered by the LabArchives susbcription-based professional version- may not seem much storage for certain disciplines but it will certainly serve the needs of many other ones and LabArchives may also be installed locally for those centres with larger storage requirements.

While some institutional approaches to RDM infrastructure creation include the development of in-house built RDM platforms, many others couldn't possibly afford the cost of such a task. In this sense, LabArchives means the opportunity to democratize the management of research data. The main requirement for LabArchives to succeed as a fully functional alternative RDM tool is now to ensure its interoperability with other well-known data management platforms such as Dryad or the institutional data repository network. Once it achieves that, it may become a formidable competitor to the JANET-brokered UMF cloud-based infrastructure for data management - and indeed a very useful complement to it.

Saturday, 24 March 2012

Northwest England DCC roadshow at the University of Salford: a report



  Although not directly related to current Sonex work on analysing requirements for dataset transfer via Sword, the international workgroup was interested in attending a DCC roadshow for gathering a view on -and providing its own input to- RDM-related training initiatives that complement direct institutional experience in RDM acquired through JISC MRD projects. So when a chance showed up to attend the Northwest England DCC roadshow at the University of Salford, we were happy to engage with the UK Digital Curation Centre for being there at the University Library on March 20th and 21th.

Training initiatives regarding research data management were also thoroughly discussed at the 'Research Data Management: Activities and Challenges' workshop organised by the Knowledge Exchange Primary Data Workgroup in Bonn last November, which Sonex also attended (and provided a report for), so there were opportunities in Salford for identifying synergies between national and international RDM initiatives in this regard. This was also the right occasion for highlighting an example of best practice in RDM-related training activities based on local network building and promoting extensive debate on where to start and how to carry on with the work, while disseminating the appropriate tools to do it along the way.

The DCC roadshow proved to be a very effective complement indeed to JISC MRD programme and other RDM-related initiatives for reaching the 'common university' - i.e. those ones where preliminary efforts -be it at researcher survey level- are taking place to build some kind of RDM infrastructure but with no particular 'official' support outside -sometimes even inside- their institutions. The event in Salford gathered representatives from many NW universities -Salford, Manchester, Liverpool, Sheffield, Leeds- and debates along the roadshow were very much enriched by the mixture of institutional profiles attending it, from librarians to research office managers to researchers to ethics committee members. The experienced DCC roadshow team -Martin Donnelly, Andrew McHugh and Patrick McCann- were also very efficient in promoting dialogue and passing on guidelines and expertise along the event.

A shorter schedule was applied to this NW England roadshow, that took just two days instead of three: a first day devoted to presentations on data management initiatives taking place in the region and a second day for group discussions on research data management needs and how to use tools provided by DCC to identify them (such as DAF, Cardio or DMPOnline). The DCC Data management roadshow is also an evolving creature and there are slight variations in content among different editions thereof, this meaning that the event focus can be adapted to different levels of regional RDM implementation: emphasis can be for instance made on advanced RDM tools such as DMPOnline where RDM initiatives are well under way while mainly focusing on the Data Assessment Framework initiative in regions where RDM lies yet at a preliminary implementation stage.

Highlights from the first day included an estimulating 'Towards Open Worlds' keynote speech by Professor Martin Hall, a Vice Chancellor showing an unusually high commitment to Open Access and keen to debate related issues with the audience. An inspiring presentation of the two-stage MaDAM/MiSS JISC MRD project at U of Manchester was also delivered by Meik Poschen, providing some guiding light for preliminary initiatives in RDM currently being carried out at other institutions. Finally, Day I sessions were closed with a four expert panel discussion, in which presenters at the event were asked to stress a specific issue in RDM they considered worth deeper examination. The answers were: Cost model (Meik Poschen, UoM), Limits to researcher time availability (Rachel Kane, U Sheffield), Who shoud lead RDM tasks - is the Library able to? (Julie Berry, Salford U) and Research motivation as a decisive argument (Graham Pryor, DCC).

Along subsequent discussions Sonex became aware of three relevant points:

- Benefits can arise regarding these issues from a deeper analysis of international RDM initiatives -including ongoing and forthcoming European projects- connected to the institutional activity in the area,

- Besides disseminating specific funder mandates, finding a way for estimating the institutional costs derived from universities not managing their research data could be a potentially very effective argument for engaging universities with RDM activity.

- There is much emphasis in discussions on how to train researchers, but not so much on how to set up and train a team of dedicated data librarians - this strongly depending on Library staff figures and on whether or not librarians see themselves as fit for the task.


On the roadshow Day II the EPSRC policy framework on RDM and its implications for RDM strategy implemention at universities and research centres were discussed, and several joint RDM planning activities were carried out by different groups using DCC tools for examining aspects such as benefits to be obtained from RDM, where each institution stands in terms of RDM implementation strategy or how to deeper engage research groups and university management into RDM.

From a Sonex point of view, attending the roadshow proved very useful for identifying successful models of RDM training and dissemination, and we would humbly recommend to provide this RDM training initiative an international profile once complete so that similar efforts may be applied to a broader context. It would also be useful that participants in the DCC roadshows could provide feedback on the impact of their taking part in the initiative on their institution's work on RDM implementation a few months afterwards. Any future reporting from the DCC roadshow team on their initiative will be a very interesting read indeed and we shall be following dissemination initiatives outside the UK -such as the talk on DMPOnline at the Future Perfect 2012 Conference in Wellington this week- and hoping they'll soon arrive to continental Europe, where their work on Data Management Plans may be particularly valuable in the near future.




Tuesday, 7 February 2012

Report on the Knowledge Exchange Workshop on RDM released


  The report on the Workshop on Research Data Management held last November by Knowledge Exchange (KE) at the Wissenschaftszentrum Bonn has already been released. This report summarizes expert group discussions on RDM funding, training, infrastructure and organisation challenges held after the KE "A Surfboard for 'Riding the Wave'" report was presented at the Workshop.



Tuesday, 13 December 2011

Thematic parallel session on metadata - actions to be taken


  On Day II of the JISC MRD Programme 2011-13 launch event in Nottingham, last Dec 2nd, specific subject-based discussion sessions were held among the different JISCMRD02 Projects for research data management in order to promote synergies and joint work on common issues. This is a brief report on the outcomes of such discussions at the parallel session on metadata - some other were simultaneously held for Institutional, Life Sciences, Engineering or Archaeology MRD projects, whose discussions have been reported elsewhere (and there are also other posts summarizing talks for this one too).

It was really hard for some of us to pick a single of those groups, since many projects actually belonged to several strands (some lucky ones had also two representatives at the event, it should be noted). The session on metadata was attended, among others, by:

- Anna Clements (U St Andrews)
- Simon Kerridge (U Sunderland)
- Kevin Ginty (U Sunderland)
- Charlotte Pascoe (British Atmospheric Data Centre)
- Pablo de Castro (SONEX Workgroup)
- Simon Hodson (JISC MRD Programme manager)
- David Shotton (U Oxford)
- Louise Corti (UK Data Archive)
- Marco Fabiani (Queen Mary U London)
...


Discussion

Metadata standards were repeatedly discussed along the session - there was a joint (and unsuccessful) attempt to recall whether anyone knew about a metadata standard registry available for different disciplines. Representatives from CERIF4Datasets Project, University of Sunderland, mentioned they were using the MEDIN metadata standard for their work in marine sciences data management. The Core Scientific Metadata Model (CSMD) standard, developed at STFC for the I2S2 Project was also mentioned as an interesting approach to multi-disciplinary metadata standard for structural sciences such as Chemistry, Materials Sciences, Earth Sciences or Biochemistry. Finally, the PIMMS Project (BADC/U Reading), mentioned Metafor as a Climate Science metadata standard and their goal of using PIMMS software tool to generate CIM-based content.

At some point the idea catched up that metadata standards should perhaps be mandated by publishers in order to harmonise discipline-specific data description procedures. Publishers are actually involved in several very successful international RDM projects, such as Dryad, but -save for REWARD- are significantly missing in JISCMRD02 projects.

Having previously developed the Semantic Publishing and Referencing (SPAR) Ontologies, David Shotton said he was now working on their extension to CERIF-based metadata description of datasets, which is closely linked to dataset CERIFication work being carried out at the CERIF4Datasets Project.


Actions

The following actions were proposed for improving the chances of metadata standard harmonisation - hence enhancing dataset discoverability:

  • Trying to locate (or otherwise collect) an already existing registry of metadata standards for different disciplines, in order to offer researchers from a given discipline an already tested metadata schema they can re-use,

  • Mapping metadata standards to each other aiming to produce a minimum-sufficient-information metadata set that may be widely applicable accross disciplines,

  • Taking steps towards organising a workshop in order to have metadata issues discussed among relevant stakeholders. ANDS Metadata Workshop in 2010 might be a potential source of inspiration for this with all those discipline-based approaches to metadata standards. Proposed dates for this Metadata WS were spring-summer 2012.


Finally, there was a wrap-up by different subject-based project groups which showed strong possibilites for a more stable cooperation among them (Biomedical/Healthcare projects even discussed the possibiity of building a common wiki). Some cooperation frameworks (googlegroups, mailing lists) might be set for promoting this disciplinar trans-project collaboration. Regarding the metadata strand, it should be noted it was also an issue in discussions held at most subject-specific workgroups, so it would potentially allow contributions from all of them.

Friday, 2 December 2011

The dawn of a new JISC MRD programme - Day I



  After a successful first stage of the JISC Managing Research Data (MRD) Programme (2009-2011), a second phase of JISC MRD was launched yesterday at the NCSL Conference Centre in Nottingham, along a 2-day event that will continue today. JISC MRD02 Programme includes 27 projects classified in three different strands:

Strand A. Research Data Management Infrastructure: 17 projects, to be completed from Mar to Jul 2013, comprising Institutional Pilot projects, Institutional Embedding and Transition to Service projects, Disciplinary projects for creative arts and archaeology, and a Metadata project,

Strand B. RDM Planning: 8 projects running until Mar 2012, aiming to design and implement data management plans and supporting services for researchers,

Strand C. Enhancing DMPOnline projects: 2 projects, aiming to customize and enhance the DCC DMPOnline Tool to improve its interaction with institutional/ disciplinary information systems).

It is worth noting that a number of funded RDM projects along this 2nd programme stage are building upon previous pilot work (projects carried out along JISC MRD programme 2007-2011) in order to for instance extend and embed data management services accross the whole institution.

On describing the research data management programme, Simon Hodson, JISC MRD programme manager mentioned there will be two further JISC MRD calls as early as Jan 2012, dealing with:

- Research data publications, aiming to build partnerships among involved stakeholders and encouraging data citation and publication,

- RDM Train, aiming to design and implement data management training strategies for specific disciplines and support roles (including librarians), to be performed by linking to professional bodies.

Emphasis will also be made along this 2nd JISC MRD programme stage on evidence gahering for project benefits and impact. A session devoted to these issues will be held on Dec 2nd, with practical work with both the Benefits Framework Tool and the Value Chain Impacts Tool. Developing metrics for measuring project impact is a specific programme goal along this 2nd implementation stage.

Project blogging

Another JISCMRD02 main objective -and closely related to impact measurement- is promotion of project dissemination and interaction among themselves and with the broader community via blogging. A specific presentation on 'blogging practices to support project work' was delivered for the purpose by Brian Kelly, UKOLN. The presentation highlighted the relevance of publishing project blogposts as an alternative means of expression to writing research papers or code, and engaged the audience in finding shared views regarding potential benefits blogging may bring to RDM projects, also providing some useful technical advice along the way.

Subsequent discussion focused on pros and cons of blogging as a communication technique (both from regular bloggers' and researchers' viewpoint), as well as on potential advantages of JISCMRD project blog aggregation, with a common RSS feed embedded back into the JISC site.

Parallel sessions and poster-session networking

Two parallel sessions came afterwards, dealing with two principal RDM issues: a first one on DCC Tools, introducing Data Asset Framework (DAF), DMPOnline and CARDIO, and summarized by Paul Stainthorp, U Lincoln, on his JISCMRD02 Day I blogpost.

The 2nd parallel session dealt with UMF Tools and related RDM projects. This 2nd session featured presentations by John Milner on JANET Brokerage and Andy Powell on Eduserv Cloud Pilot, along which the strategy for Academic Cloud service implementation was described - based on the "work with the willing" driving line. The Dynamic Purchasing System (DPS) -originally developed for utilities such as water or light- will be re-used as purchasing framework for cloud-related services. Regarding Eduserv, a 2-month 'introductory tier' will be available (just for institutions) along the service gradual implementation (storage being currently single-site, with no backups at this pilot stage, though there are plans for offering tape backup for part of the stored infrastructure).

After an interesting Q&A time, in which backup was suggested to be an absolute requirement for the success of the initiative and there were questions on various Eduserv use mode details (such as the possibility of using departmental orders/purchase order instead of credit cards for academic use), five projects from the UMF strand were briefly presented which are already working either based on a SaaS approach or in the cloud, or both: these were BRISSkit (Jonathan Tedds, U Leicester), DataFlow (David Shotton, U Oxford), Smart Research Framework (or ELB software as a service, Tim Parkinson, U Southampton), VIDaaS and YouShare Projects. Slides for these presentations will shortly be available and will be linked from here.

Finally, Day I official programme ended with a poster session and networking event, which meant a really good opportunity for RDM projects to interact with each other and with 'fellow travellers'. Synergies among projects became quite evident when having all them displayed together on a set of panels, and having their representatives available and willing to discuss each project aims, challenges and similarities to others offered a very good chance to get the general picture along with the details, as well as for establishing inter-project liasons that went well over closure time.



Wednesday, 19 October 2011

MaDAM: A JISC MRD Project for Research Data Management in the Biosciences... on the move


  Being in Manchester for the JISC Research Information management (RIM2) event, Sonex didn’t miss the opportunity it provided for paying a visit to the University of Manchester John Rylands University Library and meeting the JISC MRD MaDAM Project team. The 'MaDAM Pilot data management infrastructure for biomedical researchers at University of Manchester' has been funded by the JISC Managing Research Data Programme from Oct 2009 to Jun 2011 and has provided an inspiring example on how to start building an institutional research data management infrastructure almost from scratch.

In order to start developing this RDM infrastructure (see the Project Final Report for details), MaDAM focused on a set of research groups from the biomedical sciences strand aiming to learn about the ways they dealt with data management and to provide them -with their own close involvement- with tools to improve and standardise such practices. Selected research groups -Electron and Standard Microscopy group and Magnetic Resonance Imaging (MRI) Neuropsychiatry Unit- were chosen due to their common need to deal with large images as their main source of research data.

Project focus on a rather narrow research scope was one of the keys to its success - due to its resulting ability to define common ways for dealing with the information, eg at metadata level. The MaDAM planning included further RDM strategy extension to other research groups within the UoM based on the lessons learnt from its application to the few initially selected groups. The MiSS Project (MaDAM into Sustainable Service), funded by the JISC MRD Programme 2011-2013, will be dealing with the RDM strategy extension and widening into the whole of the UoM research works along next years.


An Oracle APEX-based research data management application was developed by MaDAM for the concerned UoM research groups -later to be revamped in order to adapt it to the regular software standards applied at UoM. Frequent meetings were held with researchers along the aplication development so their feedback could be collected to ensure it would meet their needs. Storage needs per researcher per year were estimated (at around 500 GB), a metadata standard for specific data description was devised and stored in the RDM application, and work was carried out with interoperability isses in mind, both with the University CRIS in order to automatically populate Grant and Project information attached to datasets, and with the UoM Fedora-based eScholar IR, where final-version datasets would be transferred via Sword for dissemination, sharing and re-use.


Along the MaDAM Project several conceptual needs regarding the implementation of a solid RDM infrastructure across the UoM (and beyond) were identified -which were later included in the Project Final Report- the main two of which are the following:

- Some means of academic recognition of data-related work by researchers should be put in place in order to promote their involvement in RDM schemas and the adoption of common practices,

- A research data management policy should be adopted by the University of Manchester similar to the one issued at U Edinburgh so that some guidelines are established for providing support to researcher RDM tasks.

MaDAM gradual roll-out to other UoM research groups will face a set of challenges, research data being so discipline-specific. However, plans for such an extension and for ensuring the required institutional support for such a move were designed along MaDAM development -which saw the interest in taking part in the pilot project by a number of additional UoM research groups- and extension work will start soon.

Friday, 14 October 2011

CERIFying Research Information Systems... and Research Data


  A couple of weeks ago Sonex was attending the JISC Research Information management (RIM2) event at MCC Manchester. It was a very good opportunity to review the four JISC-funded projects (BRUCE at Brunel, IRIOS at Sunderland, CERIFy at UKOLN and MICE at KCL) dealing with CERIF implementation for research information management purposes. A report for the event should be shortly available, along with the slides presented at the event.

Along this one-day meeting the CERIF for Datasets (C4D) Project was mentioned as an IRIOS Project extension to dataset management at the University of Sunderland. As stated in the project presentation, C4D aims to 'CERIFy' existing research dataset metadata conventions, and hence provide access to research data in an environment which also holds information on research projects and research outputs. C4D will also explore the commonality of research dataset metadata, and how much can be represented in CERIF.

Monday, 29 August 2011

Research data management in crystallography at the XXII IUCr Congress


  On Aug 29th a session on research data management will be held at the XXII Congress of the International Union of Crystallography (IUCr2011). The session will feature talks by Brian McMahon (IuCr), Brian Matthews (I2S2 Project), Peter Murray-Rust (CrystalEye), John Westbrook (wwPDB) and Nick Spadaccini (DDLm). Peter Murray-Rust will deliver a talk along the session on Open Crystallography.

Friday, 26 August 2011

STM research data management and the Quixote Project


  A one-day seminar was held yesterday Thu Aug 25th at the Zaragoza Scientific Center for Advanced Modeling (ZCAM) on research data management and the Quixote Project for data management in Computational Chemistry. The session, entitled “Research data management: The experience of the Quixote project for Quantum Chemistry data. Can it be extended into a collection of research data management repositories?”, was attended by a rather diverse group of researchers (both computational chemists and from other disciplines) and repository managers, aiming to learn about research data management initiatives and specifically about the progress of the Quixote Project, in which two researchers from the University of Zaragoza and the CSIC Institute of Physical Chemistry "Rocasolano" are involved.

The Quixote Project (see paper "The Quixote project: Collaborative and Open Quantum Chemistry data management in the Internet age", in press with the J Chem Inf) is developing the infrastructure required to convert output from a number of different molecular quantum chemistry (QC) packages -such as NWChem or Gaussian- to a common semantically rich, machine-readable format and to build repositories of QC data results.

The session started with an introduction to "STM Research data management initiatives in Spain and abroad" delivered by SONEX member Pablo de Castro, in which different national approaches to RDM were presented based mainly on the information collected at the JISC MRD Programme International Workshop held last March in Birmingham.

Different approaches to data management taken from the JISC and SURF Foundation were discussed at Q&A time: for the JISC, datasets are assets per se, regardless of where they are attached to a research paper as supplementary material, whereas the 'Enhanced publication' approach from the SURF Foundation in the Netherlands, regards datasets mainly as digital objects connected to research publications. Some emphasis was made on the fact that the upcoming OpenAIREPlus European project shares the SURF approach.

Two presentations on the Quixote Project followed, "From Databases in QC 2010, ZCAM, Sep 2010 onwards: a brief history of Quixote" by Jorge Estrada and "The Quixote Project: a pioneering work in managing Computational Chemistry research data" by Pablo Echenique. Both Quixote project members explained the results, the challenges and the cooperation opportunities of this non-specifically-funded RDM project, engaging in a fruitful dialogue with the attending researchers and repository managers on how the QC data assets could be best managed.

Finally Peter Murray-Rust closed the morning interventions with some reflections on the subject "Entering a new era in data management" - see his blogpost for a summary of his ideas.


In the afternoon there were joint debates on how to improve implementation of research data management initiatives. Researcher motivation for dataset sharing was extensively debated: this motivation should ideally not just arise from a given funding agency actually requiring those data to be made available, but from the sheer advantages (as summarized by Peter Murray-Rust) that doing so would bring to the research practice and communication ("improving methodology").

An independent debate session was held for discussing how to start developing some kind of research data management infrastructure in those countries where work in this area is presently beginning. These are some recommendations that were put together by the participants in the debate:

- Some workgroup of (not just library-based) IT professionals should be put together for analysing the current infrastructure and the opportunities for launching new initiatives upon potentially reusable pre-existing ones,

- It would be advisable to analyze the researcher behaviour and needs in terms of storing their datasets into international platforms for data sharing (in case they are available for their specific discipline),

- It would be interesting to examine the motivation for data sharing from research groups in different research areas, so that initial efforts to develop data management infrastructures can start working with those areas more willing to share their data (Earth Sciences recurrently showing up when analysing the international perspective),

- Pioneering initiatives for providing services to STM researchers regarding data handling and storing from given Institutional Repositories (such as eSpacio UNED and Digital.CSIC) should be highlighted as a role model to be spread,

- The OpenAIREPlus/SURF Enhanced papers approach could be a good starting point for Institutional Repositories to work at, by finding out which of their presently filed papers have supplementary data attached at the journal site and trying to independently manage those ones,

- A need was detected along the session talks with researchers for a dataset management system at research centres for basic internal organisation purposes. Datasets filed in this internal storage system may or may not be aimed for publication,

- Production and publication of potentially citable datasets should be acknowledged as a relevant scientific contribution for research assessment purposes,

- There are big differences in needs, procedures and required infrastructure regarding data management between Big Science and long-tail science (the greater part actually being groups of three researchers in a lab with specific needs of their own),

- The Library is a potential supplier of know-how on data processing and storing for researchers, and that role should be promoted within the institutions,

- The Spanish e-Research National Network, mostly dealing with Grid and supercomputing initiatives, might be a good workgroup infrastructure for pioneering data management initiatives in Spain,

- There are real collaboration opportunities between the Quixote Project and the research information management infrastruture at the University of Zaragoza (two IRs being currently available, Zaguan at the University and Digital.CSIC at the Spanish Nacional Research Council, CSIC),

- Research staff (mainly PhD students) getting involved in the management and operation of the dataset information management systems (such as Chempound data repository at the University of Cambridge) seems a prerequisite for the success of the data management initiatives

- Due to the specific data features for various research areas, the incipient data management infrastructure available is more developed for the Social Sciences and Humanities than for STM research areas.

Monday, 11 July 2011

Sword-Sonex project extension


  "Data deposit nowadays... is mainly based upon submission by email... and remains labour-intensive"
(Simon Hodson, JISCMRD Programme manager, on present data deposit workflows)


Representatives of the JISC-funded Sword and Sonex projects met Balviar Notay and Simon Hodson (JISC) on July 6th at Brettenham House, London for further dealing with Sword v2 extension to automated transfer of research data (see reference to last meeting on the issue on Nov 20th).

Once the first round of JISCMRD Phase I projects is over and final reports have been published, the Sword-Sonex workteam is already working to put together a data transfer use case document where different project solutions are listed, with their advantages and shortcomings, so that some analysis is carried out on how Sword might aid the automation of the dataset transfer into repositories (or similar target resources for research data). The team will liaise with several JISCMRD projects in order to find out their specific approach to the data transfer issue. Timeschedule for the extended Sword project (coordinated by Paul Walk, UKOLN) is as follows:

WP1: Identify key projects & individuals who have relevant information and skills regarding datasets [Jul 6-13]

WP2: Document the dataset use cases in collaboration with Sonex [Jul 18-end Aug]

WP3: Interpret the data set use cases as processes carried out with Sword [Sep 5-24]

WP4: Carry out gap analysis on dataset use cases on Sword and recommend future work, and produce a web resource for any new or existing JISC projects (such as those in JISCMRD2 Programme) to refer to, which will provide all the relevant information regarding dataset deposit [Sep 27-Oct 21]

WP5: Identify key Sword clients and potential client environments, accept and evaluate proposals, issue development contracts [Jul 6-Aug 15]

WP6: Development of 1, 2 or 3 client environments [Sep 5-end Oct]

WP7: Project management and administration [Jul 6-end Oct]

Sunday, 15 May 2011

A first analysis of data management


As previously mentioned in this blog, the Sonex workgroup is now try to extend its use case scenario analysis on 'Deposit opportunities into repositories' to the realm of research data. A first meeting held at EDINA on Mar 30th served the purpose of drawing a general picture of the data management landscape.

Stress should be put on the fact that the way of handling SSH and STM data may substantially differ. Given the strong IASSIST-attachment of some Sonex members, the workgroup initial approach to data management may therefore be a bit biased towards procedures in the area of Social Science and Humanities. However, attention will be paid as well to specific ways of dealing with STM datasets as the analysis gets fine-tuned.

Moving along the same lines as we did for research articles, we first try to tackle the ACTIONS scope. Data deposit is certainly an issue, but there's more to data-related processes than just deposit. It's also about Access to data and also about Data Notification/Register.

Next we get on to the WHAT and the WHO. Answer to WHAT? is a data set. Previous analysis by Peter Burnhill shows -at least- three different types of research data (see image below).


Dealing mainly with the data file itself, this data type classification is somewhat narrow for the general picture of data management, so Sonex would rather set a new and more generic data classification for answering the question WHAT is there to deposit:

  • Metadata record

  • Codebook or user guide, where all necessary information is provided to allow for data re-use*

  • Raw data or dataset file(s)


* See a DCMI-based description at: Inter-university Consortium for Political and Social Research (ICPSR). (2009). Guide to Social Science Data Preparation and Archiving: Best Practice Throughout the Data Life Cycle (4th ed.). Ann Arbor, MI. Section 'Important documentation elements', p. 22

These three elements should ideally be supplied as a single package.

As to the question of WHO performs each data-related operation (Notification-Deposit-Grant access), a handful of running projects within the JISC MRD (phase I) programme should serve to test the different use cases resulting from a double-entry 'Action/What' table as featured below.


Next step as we proceed to further development of this preliminary analysis should be a survey for gathering information on procedures for data handling as carried out in specific JISC MRD projects.

Wednesday, 27 April 2011

National initiatives for promoting data management strategies: an overview


- "Hello, I want to deposit my data"
- "Sir, this is a library!"
- "Sorry" -he whispers- "I want to deposit my data".
(as told by Brian Hole, British Library, along his presentation of the DRYAD UK initiative)


  Main objective of the JISC MRD International Workshop held last month was to review progress achieved by the JISC Managing Research Data Programme and to discuss this in the context of broader international developments.

As stated in the workshop programme overview, "this dimension reflects key partnerships which JISC, the JISCMRD Programme and the DCC has been building through the IDCC Conference, the Knowledge Exchange and other initiatives. They include the Australian National Data Service, the NSF funded DataNet Projects, institutions in the US and Australia, the DFG, SURF, DANS etc".

Whithin the broader context, besides a couple of preliminary talks on the European Union approach to (and future funding of) data management initiatives -by John Wood, on the EU 'Riding the wave' report, and by Carlos Morais-Pires on the Digital Agenda for Europe- the workshop featured a specific session on "National and international infrastructure initiatives" whose first panel was called "Approaches and strategies in the UK, US, and Germany". Australian and Dutch national or specific approaches were also discussed, either at this session or later along the event.

Besides the national initiatives featured in this and further sessions along the meeting -it was reassuring to see such a broad scope of strategies or already running projects taking place at the same time in so many different countries- there are also additional, sometimes preliminary initiatives for promoting data management policies at national or institutional level in other countries such as Finland, Portugal, France, Poland or South Africa.

As new initiatives for research data management keep steadily coming up, this session was an opportunity to get an informal update on DCC's report 'Comparative Study of International Approaches to Enabling the Sharing of Research Data' - see its summary and main findings here as of Nov 2008.

Digital Curation Centre - UK
Kevin Ashley, Digital Curation Centre (DCC), described the present picture of data management in the UK as "a new context", where Universities are increasingly willing to take responsibility for data management (specially in areas not covered by Data Centres).
Once UK funder and NSF rules for Data Management Planning are being implemented, this in-advance planning is becoming very important for funders, researchers, institutions, collaborators and reusers. DCC current tasks include integrating different Data Discovery Services plus building institutional capacities: skills, policies, etc. Besides that, DCC is providing the new DMP Online service aimed to produce and maintain Data Management Plans.
Good news is that, despite varying degrees of involvement, institutions in the UK have accepted their role in RDM.

NSF-funded DataNet Projects - US
A summary of present state of research data management in the US was provided by presentations of the DataONE and DataConservancy initiatives, resp. delivered by William Michener (University Libraries at U New Mexico) and Sayeed Choudhury (Johns Hopkins University).

After stating that "researchers are presently using 90% of their time managing data instead of interpreting them", W. Michener presented the Data Observation Network for Earth (DataONE) initiative (a live DataONE presentation at U of Tennessee is available). This NSF-supported initiative aims to ensure preservation and access to multi-scale, multi-discipline, and multi-national science data. DataONE Coordinating Nodes around the world will help achieving needed international collaboration for solving the grand science and data challenges, particularly with regard to education.

The DataConservancy initiative aims to research, design, implement, deploy, and sustain data curation infrastructure for cross-disciplinary discovery with an emphasis on observational data. S. Choudhury's presentation stressed the need for data preservation as a necessary condition for data reuse and introduced the recent connection of data and publications through arXiv.org as one of the pilot projects that build upon the Project APIs.

DFG - Germany
New DFG information infrastructure projects in Germany were presented by Dr Stefan Winkler-Nees, who mentioned both Jan 2009 DFG Recommendations for Secure Storage and Availability of Digital Primary Research Data, as a base report for promoting standardized work in the data management area, and DFG running call for proposals "Information infrastructures for research data". Selected projects at this call are due to be shortly announced and will start on May/Jun'2011. Finally, in a a common line of thought with other initiatives, Dr. Winkler-Nees mentioned DFG is aiming for teaching and qualification of both researchers and data curators.


SURF Foundation & DANS - The Netherlands
Later on along the workshop, John Doove presented the SURF Enhanced Publications initiative within the SURFshare programme 2007-2011. Six new projects funded along 2011 by the SURF Foundation will allow researchers from a variety of disciplines to share datasets, illustrations, audio files, and musical scores with fellow researchers in the context of Enhanced Publications (programme video available on YouTube). There were already two previous grants rounds for Enhanced Publications. The six running projects, whose results are due in May 2011, take place within five disciplines: Economics (Open Data and Publications, Tilburg University), Linguistics (Lenguas de Bolivia, Radboud University Nijmegen, and Enhanced NIAS Publications, KNAW-Royal Netherlands Academy of Arts ans Sciences), Musicology (The Other Josquin, University Utrecht), Communication sciences (Enhancing Scholarly Publishing in the Humanities and Social Sciences, KNAW) and Geosciences (VPcross, KNAW).

The Dutch strategy for increasing research data available online was completed with the presentation "Sustainable and Trusted Data Management" delivered by Laurent Sesink (DANS-Data Archiving and Networked Services). DANS, est. 2005, deals with storage and continuous accessibility of research data in
the social sciences and humanities and promotes the 'Data Seal of Approval' for certification of data repositories, guaranteeing via a series of required criteria a qualitatively high and reliable way of managing research data.

Australian National Data Service (ANDS) - Australia
Finally, Andrew Treloar, Director of Technology, Australian National Data Service (ANDS), supplied a comprehensive perspective from a national infrastructure provider and in a way summarized previous talks by saying that, despite differences, there are common themes emerging in national approaches to data management, as there are things only they can do. Along his plenary presentation "Data: Its origins in the past, what the problems are in the present, and how national responses can help fix the future" he mentioned for instance that Hubble Space Telescope-related publication statistics show double research is being done thanks to data reuse. Efficiency, validation, integrity of scholarly records, value for money and self-interest were listed as (non-altruistic) arguments for data reuse.

Having the chance to attend this series of brilliant presentations and checking out how policies for opening access to research data keep spreading over institutions and countries were undoubtedly part of the Birmingham workshop highlights. Next opportunity for keeping up with it all will be next November at the Knowledge Exchange Workshop on Research Data Management in Bonn, Germany.

Sunday, 13 March 2011

Strategies for research data deposit in ongoing data management projects


  Prior to start performing pattern analysis for research data deposit into (institutional or subject-based) data repositories –whether or not open access– first step by Sonex is to scope ongoing projects dealing with that kind of deposit, as well as already closed projects which supplied relevant guidelines on the subject. A list of projects working on data management follows, with their specific approach on how to deal with actual data deposit as taken from project blogs:


TARDIS (Monash University–Australian National Data Service).
“There is a pressing need for the archival and curation of raw X-ray diffraction data. However, the relatively large size of these datasets has presented challenges for storage in a single worldwide repository. This problem can be avoided by using a federated approach, where each institution or university utilizes its institutional repository”.


ADMIRAL: A JISC-funded data management infrastructure for research across the life sciences.
"The purpose of the ADMIRAL Project is to create a two-tier federated data management infrastructure for use by life science researchers, that will provide services (a) to meet their local data management needs for the collection, digital organization, metadata annotation and controlled sharing of biological datasets; and (b) to provide an easy and secure route for archiving annotated datasets to an institutional repository, The Oxford University Data Store, for long-term preservation and access, complete with assigned Digital Object Identifiers and Creative Commons open access licences".
(See Oxford University Library Services' Databank)


XYZ Project. “The XYZ Project will create a demonstrator of a new workflow for publishing data in support of full-text. The author prepares data for publication (if possible with validation) in a third-party trusted repository before the paper is submitted to a publisher. Our software will manage the deposition, release to reviewers, dis-embargo and for conventional publication or as a data journal. Two Open Access publishers (International Union of Crystallography and BioMed Central) are engaged with the project and will test the new workflow”.
Anticipated Outputs and Outcomes: A demonstrator repository hosted by the IUCr.


FISHnet: Freshwater information sharing network. “This project will allow researchers in multiple academic, governmental and voluntary-sector institutions to share their data. Data will be held securely in a sustainable subject repository which preserves and disseminates multiple datasets as part of the FreshwaterLife.org information portal. Data creators will be able to manage access rights to their content, from Open Access to sharing with trusted colleagues”.


DMBI: Data Management in Bio-Imaging. “The quantity of data generated by modern high-throughput bio-imaging systems presents a significant challenge in both data management and processing. Furthermore, there is no explicit system/way to record the processing algorithms and parameters that are used to produce results. Thus there is no strong link between images, software and results. This projects aims to address these issues”.
Anticipated Outputs and Outcomes: Build a prototype DMBI system around OMERO.


CaiRO: Curating Artistic Research Output. “No prominent subject-based repository exists to act as the custodians of arts practice-as-research data. Where institution provision for data management is in place (for instance, an institutional repository service) the arts researcher-practitioner cannot always rely on an understanding of the special nature of arts research data. More commonly, data is retained in departmental collections, built and maintained by small teams which often include researchers themselves”.


BRIL: Biophysical Repositories in the Lab. “The BRIL project aims to enhance the repository facilities at the Randall Division of Cell and Molecular Biophysics at King’s College London. This will involve:
» Embedding the repository within the researchers’ day-to-day research and experimental practices;
» Integrating the repository into the wider King’s infrastructure”.
Example of KCL “internal” repository: Mutation Testing Repository.


ADS+: Enhancing and Sustaining the Archaeology Data Service digital repository. The project aims to “Increase the sustainability of the ADS, by implementing Fedora (Flexible Extensible Digital Object Repository Architecture). This is a world-leading open source digital repository application which will allow the automation of many ADS curatorial functions, according to the Open Archival Information System (OAIS) Reference Model (ISO 14721:2003). This will help ensure the long term preservation of all ADS digital archives, as well as making the ADS archival procedures more cost-effective”.


IDMB (Institutional Data Management Blueprint) Project, U. Southampton.
The project’s aims are to provide the University of Southampton with a ten-year roadmap for delivery of a comprehensive data management infrastructure.

[IDMB Recommendations] The data management audit and gap analysis indicates where improvements can be made in the short, medium and long-term to improve data management practices and capabilities at the University. The following preliminary recommendations are put forward for short (one year), medium (one to three years), long (more than three years) term action.
[Short Term (1 year)] Crucial to supporting researchers is the consolidation of data management into a coherent framework that is easy to understand, use, and has a sustainable business model behind it. A number of major recommendations are put forward here for the short-term:
Create an institutional data repository
• Develop a scalable business model
• One-stop shop for data management advice and guidance


MaDAM: Pilot data management infrastructure for biomedical researchers at University of Manchester.
A pilot infrastructure for Biomedical Researchers at the University of Manchester, which covers data capture, data storage and data curation. This infrastructure comprises procedural support, hardware and software.
[18/03/2010] The development team have built a prototype data management front end which fits a generic set of needs amongst our Life Sciences researchers. It is aimed at being flexible enough to allow researchers themselves to assign attributes (i.e. metadata) to their experiments and datasets for them to be usefully categorised and tagged. The prototype is also entirely dispensable and intended as a catalyst for feedback from our use cases on their specific functionality requirements.


DISC-UK DataShare Project. The DISC-UK DataShare project, led by EDINA National Data Centre and the Edinburgh University Data Library, with partners at the Universities of Southampton and Oxford, has advanced the current provision of repository services for accommodating datasets in the UK.
Key conclusions: 1) Data management motivation is a better bottom-up driver for researchers than data sharing but is not sufficient to create culture change, 2) Data librarians, data managers and data scientists can help bridge communication between repository managers & researchers, 3) Institutional repositories can improve impact of sharing data over the internet.

Thursday, 3 March 2011

Repository take-up and embedding: the future of repositories


  Being already in Birmingham for the JISC Deposit Project Meeting on Mar 1st, Sonex stayed in town for attending the JISC Repositories Take-Up and Embedding Meeting as well. Start up meeting for this new JISC programme aimed to outline the future of repositories, dealing with specific issues such as (automated) deposit, shared services like RoMEO or OpenDOAR, repository integration into general software infrastructures for research information managament and promoting national (via RSP) and international (via KE, COAR and OpenAIRE) collaboration.

Six projects were presented along this programme start up meeting:

- Bringing a Buzz to NECTAR (Miggie Pickton, University of Northampton)
- Hydrangea: letting the repository flower (Richard Green, University of Hull)
- MIRAGE 2011: Repository Enrichment from Archiving to Creation (Xiaohong Gao, Middlesex University)
- Enhanced interface design for supporting take-up and embedding of the Glasgow School of Art research repository, including visual
engagement with practice led and applied outputs (Robin Burgess, Glasgow School of Art)
- eNova (Marie-Therese Gramstadt, VADS)
- EXPLORER: Embedding eXisting & Propriatary Learning in an Open-source Repository to Evolve new Resources (Alan Cope, De Montfort University)

An extra postprandial presentation on repository consolidation within a university research information management environment and the way it was done at University of Glasgow Enlighten IR was delivered by Willian Nixon. Statements like "Silos are the past, embedding repositories -through the use of tools like Sword or LDAP- is the future" made the point on how repositories should evolve in the future. According to William, repositories are to exploit new opportunities for data mining, business, intelligence, KPIs, analytics, 'stickiness' and visibility (some of these issues being thoroughly dealt with at Enlighten repository blog).

There was a remarkable presence of image-related projects among the presentations, Glasgow School of Arts, eNova and MIRAGE 2011 dealing with archiving of images into repositories one way or another. This is great news for momentum-gaining development of new information infrastructures in the area (also traceable at the JISC Deposit Programme meeting the day before), which will no doubt benefit from these projects outcomes.

After watching project presentations from a Sonex point of view, it seems they could particularly benefit from interacting with JISC Deposit projects in terms of implementing resulting strategies for automated content ingest into repositories. A handful of the take-up and embedding projects would thus be the soundest candidates for initial "customer implementation" of the various resulting methods for quick population of repositories with institutional research output (the take-up bit, prior to embedding) coming from the Deposit strand. As these projects will run
until the end of 2011 and the ones from Deposit strand should deliver around July, interaction among them could probably be easily achieved.

There was one particular project among those presented that captured Sonex's attention: MIRAGE 2011, Middlesex Medical Image Repository with a Content-Based Image Retrieval Systems Archiving Environment. MIRAGE is both an image-related repository project (as it deals with medical images) and a research data project, and it's this latter feature what gets it fully within scope of Sonex activity with regard to research data management. Ongoing data management projects (either JISC-funded or otherwise) usually deal with either numerical or textual data, but projects dealing with the deposit of graphical research data are rare (save for Data Management in Bio-Imaging - DMBI project run at The John Innes Centre, BBSRC, Norwich).

A couple of references were shared with MIRAGE project manager Dr. Xiaohong Gao, 'Feeding Neuroimaging Repositories' poster presented at OR2010 Madrid last July by a team of Universitat Autònoma de Barcelona (UAB)-Hospital de la Santa Creu i Sant Pau researchers in Barcelona, and the MIDAS/National Alliance for Medical Image Computing (NAMIC) medical image repository as to promote synergies among different projects on the same area.

The meeting presentations will shortly be available.

Sunday, 16 January 2011

"On such a full sea are we now afloat"

Such quotation -from W. Shakespeare's 'Julius Caesar'- closed Drs. Eefke Smit's talk "Taking the Current when it Serves: Research Data from the Publisher's Perspective" she delivered along 'Academic Publishing in Europe': the APE 2011 conference, held at the Berlin-Brandenburg Academy of Sciences in Berlin, Jan 11-12th, 2011.


Aiming to gather some facts for its ongoing analysis on research data management and its deposit into repositories, Sonex just attended APE2011, a meeting for the publishing industry and its environment held yearly in Berlin since 2006. The conference organisers do regularly publish a brief official report shortly after the event celebration (reports on previous APE editions
available here, report on this edition due shortly).

This particular visit to Berlin offered the chance to attend yet another event besides APE2011: the SOAP Symposium. Final report by the SOAP (Study of Open Access Publishing) project survey was presented along this one-day meeting, held on Jan 13th in the Goethe Room of the renowned Harnack-Haus in Berlin. The SOAP project describes and analyses the open access publishing landscape as well as exploring the risks and opportunities of the transition to open access publishing for libraries, publishers and funding agencies - see preliminary survey results, final report will be available as of next March.

The conference programme for APE2011, entitled "Smarter Publishing in the New Decade", included promising topics such as evolution of peer-review and ways to improve it, the so-called data deluge, business opportunities in China and how Open Access is becoming increasingly mainstream within the publishing environment. Discussions on those matters were lively both at round tables and at lunch pauses. Sonex interest being mainly on research data management, this report will subsequently focus on presentations and debates on the subject.

On Tuesday Jan 11th afternoon, a session was held on “The Data Deluge: to Drown or to Swim?”, chaired by Bob M. Campbell. Herbert Gruttenmaier, INIST-CNRS, started his presentation "Helping to Ride: a look at data sharing and access policies" by reminding that, since we were in Berlin, the definition of an Open Access Contribution on page 1 of the Berlin Declaration on Open Access to Knowledge includes “raw data and metadata”. Some highlights from his talk were:


  • There is a large number of Data Sharing Policies being defined by administrations, institutions, funding agencies and publishers themselves under the guideline "data should be made as freely and widely available as possible". See for instance NSF’s requirement for submission of data management plans of May 10th, 2010, under general policy statement “Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants”.
    Or the very recent (Jan 10th, 2011) commitment by a group of major international funders of public health research to “work together to increase the availability of data emerging from our funded research, in order to accelerate advances in public health”.

  • Publishers such as BioMed Central were featured as high-profile supporters of Open Data (see Dec 11th, 2010 post at this blog), and NPG editorial policy on dataset sharing was specifically mentioned along the talk, as well as the Brussels Declaration on STM Publishing statement that “Raw research data should be made freely available to all researchers”. Finally, discipline-based data policies such as PaN-Data Scientific data Policy Draft for Scientific Data Management Framework at European Photon and Neutron Facilities or the Joint Data Archiving Policy (JDAP) adopted in a coordinated fashion by Dryad partner journals.

  • Not everything is that simple though: the Nov 2009 "Patterns of information use and exchange: case studies of researchers in the life sciences” RIN report shows that researchers are not so eager to share their data with others, and that ‘one-size-fits-all’ information and data sharing policies may not achieve the goals there are aiming for, namely scientifically productive and cost-efficient information use in life sciences.

Drs. Eefke Smit, International Association of STM Publishers, provided a counterexample for these growing data sharing policies by publishers along her talk on "Research Data from the Publisher's Perspective" by describing the Journal of Neuroscience policy of no longer taking supplementary material from authors since Nov 1st, 2010, the procedure posing too heavy a burden on paper reviewers.
She also warned of the so-called data deluge, according to which tera- and petabite sized datasets will increase their share in research projects in upcoming years.
However, when researchers are asked where they would like to submit their research data, the answer is more often than not "publishers". This brings along the issue of research data preservation: results of an internal survey by STM Publishers show what she called “an improvable situation” with regard to preservation.

Planned talk “Data Publishing in the Context of the ICSU World Data System” by Dr. Michael Diepenbroek, Director of WDC-MARE/PANGAEA, University of Bremen, went finally off the conference programme. However, the next speaker, Dr. Jan Brasse, Managing Director of DataCite, provided some information on the progress of one of the main databases for research data in the geosciences area, by for instance stating there was “a wide cooperation between Elsevier and PANGAEA via DOI-based external links from online papers” at the former’s platforms. This kind of cooperation between publishers and international databases for handling research data might be useful for tacking the abovementioned data preservation issues.
Dr. Brasse, affiliated with the German National Library of Science and Technology Hannover, described as well the evolution of the DataCite international project as it gets carried out by local member institutions: as of Dec’10, over 1M records are already registered with DOI names at datacite.org. Perspectives for the project include setting up of a Central Metadata Base as of Jun'11; DataCite becoming a harvest point for third parties such as WoS; and cooperation via CrossRef for data-article lookup.

The data management session ended with the talk on “Managing Publication and Research Data: the eSciDoc Research Infrastructure” by Dr. Malte Dreyer from Max Planck Digital Library (MPDL). eSciDoc is as a joint project of the Max Planck Society and FIZ Karlsruhe, funded by the Federal Ministry of Education and Research (BMBF), with the aim to realize a next-generation platform for communication and publication in research organization. Further eSciDoc projects mentioned along the presentation and dealing with research data management were ‘Astronomer‘s Workbench’ (astronomy), Lifecycle Logger (biochemistry) and BW-eSci(T) for computational linguistics. DARIAH (Digital Research Infrastructure for the Arts and Humanities) –in whose development eSciDoc is directly involved- and CLARIN (Common Language Resources and Technology Infrastructure) projects were repeatedly highlighted along the session as leading EU projects on development of digital research infrastructure (including data management) for the Humanities and Social Sciences.


A joint panel discussion was then held after the presentations on research data management, with speakers taking questions from the floor. Alicia Wise, Elsevier Director of Universal Access and former archaeologist raised the issue of costs attached to research data management and who should fund them: it was agreed by the panellists that national funding bodies should assume the cost of data management. Along her question Dr. Wise incidentally mentioned that data management at the archaeological research project she used to work for succeeded only thanks to researchers dedicating 50% of their time to data curation. This aspect of dataset deposit will be examined by Sonex in order to identify alternative (automatic) curation procedures currently being used to relieve researchers of the data curation burden.

The data management issues extended well outside the session specifically devoted to them and into the Innovation session held next day, where Portland Press Adam Marshall presentation on the Semantic Biochemical Journal and Project Utopia at the Manchester School of Computer Science did extensively deal with data handling (see “Calling International Rescue: knowledge lost in literature and data landslide!” at Biochem J. (2009) 424, 317–333 for a review on “how to provide new ways of interacting with the literature, and new and more powerful tools to access and extract the knowledge sequestered within it”).

At the end of the data session panel discussion Dr. Eefke Smit synthesized the three challenges of research data management: normalization, standardization and migration. She did also remind the audience of verses following the one quoted in the title of this post:

(…) On such a full sea are we now afloat,
And we must take the current when it serves,
Or lose our ventures
.

Sunday, 12 December 2010

A preliminary list of discipline-specific projects on research data management

A preliminary list follows of currently running discipline-specific projects and initiatives (as of Dec 2010) dealing with research data management. The list below is not comprehensive, but a sample of ongoing projects, brought together in order to find out potential biases by area in current research data management projects. Should there be relevant projects missing, we’d appreciate a notification for including them as well.

[projects/initiatives listed in alphabetical order]

Project name: ACRID: Advanced Climate Research Infrastructure for Data
Institution/Funder/Manager: U East Anglia, STFC, Met Office, JISC
Project Description: The ACRID Project aims to develop an approach to publishing climate research data in a way that facilitates citing, re-use and the provision of full provenance information for processed data.
Area/Discipline: Climate Science


Project name: ADMIRAL
Institution/Funder/Manager: U Oxford, JISC
Project Description: A data management infrastructure for research across the life sciences
Area/Discipline: Life Sciences


Service/Project name: ADS: Archaeology Data Service
Institution/Funder/Manager: U York, AHRC, JISC, EU (mandated repository for AHRC, NERC)
Service/Project Description: The Archaeology Data Service supports research, learning and teaching with high quality and dependable digital resources. It does this by preserving digital data in the long term, and by promoting and disseminating a broad range of data in archaeology. The ADS promotes good practice in the use of digital data in archaeology, it provides technical advice to the research community, and supports the deployment of digital technologies.
ADS is actively engaged with research projects working with partners in all sectors of UK archaeology.
Area/Discipline: Archaeology


Project name: Global Argo Data Repository
Institution/Funder/Manager: NOAA, NODC (National Oceanographic Data Center), GODAE (Global Ocean Data Assimilation Experiment), IFREMER (Institute for Research and Exploitation of the Sea)
Project Description: In the year 2000, a global array of approximately 3,000 free-drifting profiling floats, known as the Argo Ocean Profiling Network, was planned as a major component of the ocean observing system. Argo originated from the need to make climate predictions on both short and long time scales and has led to international participation and collaboration to ensure global coverage.
Centers to handle the data collected by profiling floats have been established in a number of countries. These centers normally handle data from their nationally deployed floats, but sometimes provide that service to other countries or organizations. All Argo data will be publicly available in near real-time via the GTS (Global Telecommunications System) and in scientifically quality-controlled form with a few months delay.
Area/Discipline: Marine Sciences, Oceanography


Project name: BlueObelisk
Institution/Funder/Manager: Group of chemists/ programmers/informaticians
Project Description: The Blue Obelisk Data Repository lists many important chemoinformatics data such as element and isotope properties, atomic radii, etc. including references to original literature
Area/Discipline: Chemoinformatics


Project name: BRIL: Biophysical Repositories in the Lab
Institution/Funder/Manager: CeRch-KCL, JISC
Project Description: The BRIL project aims to enhance the repository facilities at the Randall Division of Cell and Molecular Biophysics at King’s College London by:
- Embedding the repository within the researchers’ day-to-day research and experimental practices
- Allowing data and metadata to be captured in automated fashion
- Allowing the structure of experimental processes as a whole to be captured, modelled and stored within the repository
- Enhancing browse and access facilities and data exchange facilities to increase interoperability.
Area/Discipline: Biophysics


Project name: CAiRO: Curating Artistic Research Output
Institution/Funder/Manager: U Bristol, DCC, JISC
Project Description: Research data created by the UK’s performance and visual arts departments is often rich, technically complex and amazingly varied in nature. This work may include interconnected multimedia records of a single live event or software which exhibits complex behaviours dependant upon the choices made by a viewer. The CAiRO project, funded as part of the wider JISC Managing Research Data programme, aims to offer data management skills tailored to the special requirements of the arts researcher-practitioner.
Area/Discipline: Creative Arts


Project name: The CEACS Data Library
Institution/Funder/Manager: CEACS Library, Center for Advanced Study in the Social Sciences (CEACS), Instituto Juan March, Madrid, Spain
Project Description: The CEACS Data Library provides support to its research community in conducting quantitative research with primary and secondary data. The Data Library has a collection of over 2,000 secondary research datasets from major data centres. The service supports research data management through a thematic website, one to one support and a Dataverse data repository to help with the management, sharing and preservation of the data produced by researchers.
Area/Discipline: Social Sciences


Project name: Data Conservancy: A New Vision for Data-Driven Science
Institution/Funder/Manager: National Science Foundation (NSF), Johns Hopkins University (Lead institution)
Project Description: The Data Conservancy (DC) embraces a shared vision: scientific data curation is a means to collect, organize, validate and preserve data so that scientists can find new ways to address the grand research challenges that face society.
Area/Discipline: Astronomy, Earth Sciences, Life Sciences and Social Sciences


Project name: DataONE
Institution/Funder/Manager: National Science Foundation (NSF)
Project Description: DataONE was conceived to ensure preservation and access to multi-scale, multi-discipline, and multi-national data about life on earth and the environment that sustains this life. It was recognized from the outset that such data are often difficult to discover, access, integrate and analyze.
Area/Discipline: Earth & Life Sciences


Project name: DataTrain
Institution/Funder/Manager: U Cambridge, ADS, DCC, JISC
Project Description: The DataTrain project aims to build on findings and tools developed in the Incremental project (JISC 07/09 funding strand), to design discipline-focused data-management training modules for post-graduate courses in Archaeology and Social Anthropology at the University of Cambridge.
Area/Discipline: Archaeology, Social Anthropology


Project name: DATUM for Health: Research data management training for health studies
Institution/Funder/Manager: Northumbria U, DCC, JISC
Project Description: This collaborative project seeks to promote research data management skills of postgraduate research students in the health studies discipline through a specially-developed training programme which focuses on qualitative, unstructured research data.
Area/Discipline: Health Sciences


Project name: DMBI: Data Management in Bio-Imaging
Institution/Funder/Manager: The John Innes Centre (BBSRC), Norwich BioScience Institutes, JISC
Project Description: DMBI aims to raise the level of data management/handling for high-throughput bio-imaging, and strengthen the interactions between image data silos, both internally and with partner organisations.
Area/Discipline: Biology/Bio-imaging


Project name: DMP-ESRC: Data management planning for ESRC research data-rich investments
Institution/Funder/Manager: UK Data Archive (UKDA), Economic and Social Research Council (ESRC), Joint Information Systems Committee (JISC)
Project Description: Data Management Planning (DMP) project aims to increase the data management and sharing capability within the social sciences community.
Area/Discipline: Social Sciences


Project name: DMTpsych: Data Management Training for psychologists
Institution/Funder/Manager: U York, U Sheffield, Sheffield Hallam U, DCC, JISC
Project Description: The aim of DMTpsych is to build capacity and skills within psychology postgraduates relating to research data management. The project builds upon existing research data management materials developed by the Digital Curation Centre (DCC) to create discipline-focused postgraduate training materials that can be embedded into postgraduate research training for the psychological sciences.
Area/Discipline: Psychology


Project name: DRYAD UK
Institution/Funder/Manager: British Library, University of Oxford, JISC
Project Description: Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences as published by a Consortium of Journals. Dryad UK aims to expand Dryad into the UK by establishing a UK mirror site and extending service to new publishers and disciplines.
Area/Discipline: Biomedical Sciences


Project name: EDgrid Central: Data Repository System for 3-D Full-Scale Earthquake Testing Facility
Institution/Funder/Manager: National Institute for Advanced Industrial Science and Technology, Japan
Project Description: A data repository system called EDgrid Central is designed for storing huge amount of experiment data by using a 3-D full-scale earthquake testing facility. The EDgrid Central prepares large storage capacity and implements a data modeling for the shake test in the backend. The frontend is a portal for users to retrieve the stored data by meta-data search and bulk download. This system uses the NEEScentral developed by the NEES project in the United States by enhancing search and download functionalities, according to the EDgrid users' requirements. The EDgrid Central allows facility sites to have a permanent repository of the shaking table experiment and it also enables civil engineering researchers to share their data and reports in their daily activities.
Area/Discipline: Geophysics


Project name: EIDCSR: Embedding Institutional Data Curation Services in Research
Institution/Funder/Manager: U Oxford, JISC
Project Description: The Embedding Institutional Data Curation Services in Research (EIDCSR) project aims to address the data management and curation requirements of three collaborating research groups in Oxford, by scoping their requirements and embedding selected elements of the digital curation lifecycle, including policy, workflow, and sustainability solutions within the research process. The workflows generated by the project are intended to scale to include other research domains and the outputs should be of use to other research intensive institutions. Project runs until Dec'10.
Area/Discipline: Medical & Life Sciences


Project name: ERIM: Engineering Research Information Management
Institution/Funder/Manager: U Bath, UKOLN, JISC
Project Description: ERIM aims to specify in practical terms how effective data management can be enabled and supported in research projects, particular to support reuse or more broadly what can be thought of as 're-purposing'. The project will look primarily at the engineering research domain.
Area/Discipline: Engineering


Project name: EURO VO: European Virtual Observatory
Institution/Funder/Manager: CNRS, ESO, INAF, U Edinburgh
Project Description: The Virtual Observatory (VO) is an international astronomical community-based initiative. It aims to allow global electronic access to the available astronomical data archives of space and ground-based observatories and other sky survey databases. It also aims to enable data analysis techniques through a coordinating entity that will provide common standards, wide-network bandwidth, and state-of-the-art analysis tools. The EURO-VO project aims at deploying an operational VO in Europe. Its objectives are the support of the utilization of the VO tools and services by the scientific community, the technology take-up and VO compliant resource provision and the building of the technical infrastructure.
Area/Discipline: Astronomy


Project name: FISHnet
Institution/Funder/Manager: Centre for e-Research, King’s College London, JISC
Project Description: Freshwater information sharing network
Area/Discipline: Freshwater Biology


Project name: HALOGEN - History Archaeology Linguistics Onomastics and GENetics
Institution/Funder/Manager: U Leicester, JISC
Project Description: The cross-disciplinary Roots of the British collaboration between scholars in humanities and genetics seeks to interrogate the evidence for the migration and/or continuity of human populations in the British Isles in the distant past. The HALOGEN project will support the data management needs of the researchers involved and thus establish organisational best practice in terms of data management planning and the support of diverse cross-disciplinary research data.
Area/Discipline: Ancient history/Genetics


Project name: I2S2
Institution/Funder/Manager: UKOLN/DCC/Soton/STFC, JISC
Project Description: Infrastructure for integration in structural sciences
Area/Discipline: Chemistry (with a view towards inter-disciplinary application)


Project name: Incremental: A step by step approach to informing, improving, & increasing research data curation practice
Institution/Funder/Manager: Cambridge University Library, Humanities Advanced Technology and Information Institute (HATII) at U Glasgow, DCC, JISC
Project Description: The aim of Incremental is to inform, improve and increase research data curation within UK HEIs, by providing exemplars and resources for others to use. Specific objectives are: (1) to investigate current practices and requirements at each institution; (2) to develop a plan for addressing these requirements; (3) to pilot tools and services at each HEI and then make further adjustments and recommendations; (4) embed the work within each institution; and (5) to deliver resources and findings to the DCC, DPC and JISC for wider dissemination. In addition to resources, the project will seek to provide information about their cost and sustainability.
Area/Discipline: Archaeology, Chemistry, English, Engineering and Medicine


Project name: IODP: Integrated Ocean Drilling Program
Institution/Funder/Manager: National Science Foundation (NSF), Japan’s Ministry of Education, Culture, Sports, Science and Technology (MEXT)
Project Description: IODP is an international marine research program that explores Earth's history and structure recorded in seafloor sediments and rocks, and monitors subseafloor environments. IODP builds upon the earlier successes of the Deep Sea Drilling Project (DSDP) and Ocean Drilling Program (ODP), which revolutionized our view of Earth history and global processes through ocean basin exploration.
The IODP oversees repositories around the world. Samples are distributed according to ODP and IODP policies.
Area/Discipline: Marine Sciences


Project name: MaDaM
Institution/Funder/Manager: Manchester eResearch Centre, JISC
Project Description: Pilot data management infrastructure for biomedical researchers
Area/Discipline: Biomedical Sciences


Project name: Managing Research Data: Gravitational Waves (MRD-GW)
Institution/Funder/Manager: STFC, University of Glasgow, JISC
Project Description: MRD-GW aims to examine the way in which Big Science data is managed, and produce recommendations as appropriate. Gravitational Wave (GW) data generated by the LIGO Scientific Consortium (LSC) will be used as a case-study.
Area/Discipline: Particle physics/Astronomy


Project name: PANGAEA
Institution/Funder/Manager: Alfred Wegener Institute for Polar and Marine Research (AWI), DFG
Project Description: Publishing Network for Geoscientific & Environmental Data
Area/Discipline: Earth Sciences


Project name: PEG-BOARD
Institution/Funder/Manager: School of Geographical Sciences, University of Bristol, JISC
Project Description: Palaeoclimate and environment data generation - building open access to research data
Area/Discipline: Palaeoclimatology


Project name: Quixote
Institution/Funder/Manager: U Cambridge/CSIC
Project Description: The main objective/vision of the Quixote project is to design, test and deploy a modular, open source system of tools that allow computational chemistry data (now sitting in the darkness of individual hard-disks) to be organized, shared, and queried
Area/Discipline: Quantum Chemistry


Project name: Research Data MANTRA
Institution/Funder/Manager: U Edinburgh/JISC
Project Description: Aims to develop open, online learning materials which reflect best practice in research data management grounded in three disciplinary contexts: social science, clinical psychology, and geoscience. The resulting materials will be embedded in three participating postgraduate programmes and made available through the Transkills programme for use by all postgraduate and early career researchers as well as made available generally through an open license. In addition to web-based 'chapters' that students can work through at their own pace, the course will include video interviews with leading academics about data management challenges, and practical exercises in handling data in four software analysis environments: SPSS, NVivo, R and ArcGIS.
Area/Discipline: Social and political science, Geoscience, Clinical psychology


Project name: SageCite: Citing network models of disease and associated data
Institution/Funder/Manager: UKOLN, U Manchester, British Library, JISC
Project Description: SageCite will develop and test a Citation Framework linking data, methods and publications. The domain of bio-informatics provides a case study, and the project builds on existing infrastructure and tools. Citations of complex network models of disease and associated data will be embedded in leading publications, exploring issues around the citation of data including the compound nature of datasets, description standards and identifiers.
Area/Discipline: Bioinformatics


Project name: ShareGeo Open
Institution/Funder/Manager: EDINA, JISC
Project Description: ShareGeo Open is a spatial data repository that promotes data sharing between creators and users of geospatial data
Area/Discipline: Geography


Project name: SPQR: supporting productive queries for research
Institution/Funder/Manager: KCL, U Edinburgh, Humboldt U Berlin, JISC
Project Description: The overall aim is to investigate the potential of linked data for integrating datasets related to classical antiquity, in particular addressing the particular challenges raised by our material – its incompleteness, uncertainty and fuzziness. We will achieve this by developing mechanisms for breaking data out of silos and exposing it as linked data, using standard ontologies, and in particular the Europeana Data Model, as the semantic “glue” for linking data into a wider network of knowledge. The ultimate objective will be to create a common corpus or “RDF warehouse” of linked Classics data that can be explored, searched and enhanced by further annotations.
Area/Discipline: Classics, Epigraphy and Archaeology


Project name: SUDAMIH
Institution/Funder/Manager: University of Oxford, JISC
Project Description: Supporting data management infrastructure for the Humanities
Area/Discipline: Humanities


Project name: TARDIS
Institution/Funder/Manager: Monash University, Australian National Data Service (ANDS), University of Sidney and some other Australian institutions
Project Description: TARDIS is a multi-institutional collaborative venture that aims to facilitate the archiving and sharing of raw X-ray diffraction images (collectively known as a 'dataset') from the protein crystallography community.
Area/Discipline: Crystallography


Project name: VAMDC Project: Virtual Atomic and Molecular Data Centre
Institution/Funder/Manager: EU, CNRS, CMSUC, UCL, OU, UNIVIE, UU, KOLN, INAF, QUB, AOB, ISRAN, RFNC-VNIITF, IAO, IVIC, INASAN
Project Description: VAMDC aims at building an interoperable e-Infrastructure for the exchange of atomic and molecular data. It embraces on the one hand scientists from a wide spectrum of disciplines in atomic and molecular (AM) Physics with a strong coupling to the users of their AM data (astrochemistry, atmospheric physics, plasmas) and on the other hand scientists and engineers from the ICT community used to deal with deploying interoperable e-infrastructure.
Area/Discipline: Astrophysics


Project name: WissGrid: Grid for Science
Institution/Funder/Manager: DFG, U Göttingen, Astrophysikalisches Institut (AIP), Alfred-Wegener-Institut (AWI), Deutsches Elektronen Synchrotron (DESY), Deutsches Klimarechenzentrum GmbH (DKRZ), Konrad-Zuse-Zentrum für Informationstechnik (ZIB), Universitätsmedizin Göttingen (UMG), Niedersächsische Staats- und Universitätsbibliothek (SUB), Technische U Dortmund (UDO), U Heidelberg, U Trier, U Wuppertal
Project Description: WissGrid’s objective is to establish long-term organisational and technical D-Grid structures for the academic world. WissGrid combines the heterogeneous needs from a variety of scientific disciplines and develops concepts for the long-term sustainable use of the organisational and technical grid infrastructure. In this context, the project aims to strengthen the organisational cooperation of scientists in the grid and to lower the entry barriers for new community grids.
Area/Discipline: Astrophysics, High Energy Physics, Climate Research, Medicine


Project name: XYZ Project
Institution/Funder/Manager: U Cambridge/IUCr/BioMed Central/Open Knowledge Foundation, JISC
Project Description: The XYZ Project will create a demonstrator of a new workflow for publishing data in support of full-text. The author prepares data for publication (if possible with validation) in a third-party trusted repository before the paper is submitted to a publisher. Our software will manage the deposition, release to reviewers, dis-embargo and for conventional publication or as a data journal
Area/Discipline: Crystallography


Besides this preliminary set of discipline-specific research data-related running projects -to be shortly enriched by Sonex with a complementary list of general purpose projects dealing with research data management- a thorough list of open data repositories for all areas may be found at the data repository section of the Open Access Directory (OAD).