First objective of the JISC-supported Sonex initiative was to identify and analyse deposit opportunities (use cases) for ingest of research papers (and potentially other scholarly work) into repositories. Later on, the project scope widened to include identification and dissemination of various projects being developed at institutions in relation to the deposit usecases previously analyzed. Finally, Sonex was recently asked to extend its analysis of deposit opportunities to research data.






Tuesday, 5 April 2011

I2S2 Project workshop at RAL-STFC


  Along a busy week in terms of research data management events (due to be shortly reported from this blog), last Friday Apr 1st Sonex had the opportunity -thanks to Simon Hodson, JISC MRD programme manager- to attend the I2S2 Project workshop at the Rutherford-Appleton Laboratory (RAL) at STFC in Didcot. I2S2 -standing for 'Infrastructure for Integration in Structural Sciences' is a JISC MRD project ending in Mar 2011 aiming to "identify requirements for a data-driven research infrastructure in "Structural Science", focusing on the domain of Chemistry, but with a view towards inter-disciplinary application".


Several presentations were delivered along the meeting: Brian Matthews on the I2S2 project achievements, ICAT architecture and CSMD metadata standard, Brian McMahon, International Union of Crystallography (IUCr) on 'Information Management and Publication in Crystallography', Tom Griffin on TopCAT GUI for management of data coming out of STFC ISIS and DIAMOND facilities, Steve Androulakis on the TARDIS ANDS-supported project at Monash University, Mark Borkum on OreCHEM files, Chris Morris on on PiMS (Protein Information Management System) and Juan Bicarregui on the EU PANData project.

Along the IUCr presentation the need was identified for filing & preserving different data categories such as raw measurements, processed numerical data, derived info and the paremeters. The convenience of providing access to raw diffraction images was also stressed along the talk, these files being a few GB in size, and thus not large enough for Data Centres but too big for sites such as CCDC. A review on Crystallographic Information Framework (CIF) file formats was provided, with imgCIF being used for raw data storing out of the experiment, .fcf for including structure factors after data reduction and a final stage of structure solution and refinement being performed in the lab before the author starts formatting those into a IUCr paper, which would translate CIF into SGML for producing final fcf, cif, pdf and html versions.

Raw data was mentioned to be kept for 183 days at SFTC and 3 months at Australian Synchrotron (in which TARDIS is involved), and a discussion followed on the fact that some agreement shoud be reached on the kind of data that ought to be stored and preserved. The process of attachment of DOIs to datasets was also discussed, IUCr being presently involved in projects such as XYZ or Open Bibliography in order to promote this objective.


A TopCAT demo was provided by Tom Griffin. This open source GUI (see image above) is being used for storing raw data from STFC facilities such as ISIS and DIAMOND. TopCAT provides access to its contents through an open registration system, thus operating as a sort of STFC institutional data repository, and would be potentially applicable to other institutions, facilities and disciplines.

TARDIS presentation by Steve Androulakis, Monash Univ, Australia, mentioned their using of XML/METS metadata standards for research data description at the federated institutional repository-platform initially meant to store X-ray diffraction images, later evolving into a much larger initiative with application into microscopy (MicroTARDIS), particle physics and gene processing through the Squirrel software.

Finally, extra presentations were delivered on PiMS (Protein Information Management System) by Chris Morris, STFC and on the European PANData project by Juan Bicarregui, STFC e-Science. PANData aims to build Photon and Neutron Data Infrastructure through a consortium of European synchrotron facilities and neutron sources.


A final summary was made on the whole set of presented I2S2-related features (imgCIF, CIF, IuCr/XML/RDF BIBLIO, PDBML, CML, ICAT, TopCAT, ICAT Lite/CSMD, TARDIS, PiMS, PANData, NeXuS) by mapping them on the I2S2 Idealized Scientific Research Activity Lifecycle Model (see image above - may click on it for an updated version). References were also made to other initiatives not represented at the meeting such as Quixote Project for Computational Chemistry CML data management or Protein Production and Crystallization.

Sunday, 13 March 2011

Strategies for research data deposit in ongoing data management projects


  Prior to start performing pattern analysis for research data deposit into (institutional or subject-based) data repositories –whether or not open access– first step by Sonex is to scope ongoing projects dealing with that kind of deposit, as well as already closed projects which supplied relevant guidelines on the subject. A list of projects working on data management follows, with their specific approach on how to deal with actual data deposit as taken from project blogs:


TARDIS (Monash University–Australian National Data Service).
“There is a pressing need for the archival and curation of raw X-ray diffraction data. However, the relatively large size of these datasets has presented challenges for storage in a single worldwide repository. This problem can be avoided by using a federated approach, where each institution or university utilizes its institutional repository”.


ADMIRAL: A JISC-funded data management infrastructure for research across the life sciences.
"The purpose of the ADMIRAL Project is to create a two-tier federated data management infrastructure for use by life science researchers, that will provide services (a) to meet their local data management needs for the collection, digital organization, metadata annotation and controlled sharing of biological datasets; and (b) to provide an easy and secure route for archiving annotated datasets to an institutional repository, The Oxford University Data Store, for long-term preservation and access, complete with assigned Digital Object Identifiers and Creative Commons open access licences".
(See Oxford University Library Services' Databank)


XYZ Project. “The XYZ Project will create a demonstrator of a new workflow for publishing data in support of full-text. The author prepares data for publication (if possible with validation) in a third-party trusted repository before the paper is submitted to a publisher. Our software will manage the deposition, release to reviewers, dis-embargo and for conventional publication or as a data journal. Two Open Access publishers (International Union of Crystallography and BioMed Central) are engaged with the project and will test the new workflow”.
Anticipated Outputs and Outcomes: A demonstrator repository hosted by the IUCr.


FISHnet: Freshwater information sharing network. “This project will allow researchers in multiple academic, governmental and voluntary-sector institutions to share their data. Data will be held securely in a sustainable subject repository which preserves and disseminates multiple datasets as part of the FreshwaterLife.org information portal. Data creators will be able to manage access rights to their content, from Open Access to sharing with trusted colleagues”.


DMBI: Data Management in Bio-Imaging. “The quantity of data generated by modern high-throughput bio-imaging systems presents a significant challenge in both data management and processing. Furthermore, there is no explicit system/way to record the processing algorithms and parameters that are used to produce results. Thus there is no strong link between images, software and results. This projects aims to address these issues”.
Anticipated Outputs and Outcomes: Build a prototype DMBI system around OMERO.


CaiRO: Curating Artistic Research Output. “No prominent subject-based repository exists to act as the custodians of arts practice-as-research data. Where institution provision for data management is in place (for instance, an institutional repository service) the arts researcher-practitioner cannot always rely on an understanding of the special nature of arts research data. More commonly, data is retained in departmental collections, built and maintained by small teams which often include researchers themselves”.


BRIL: Biophysical Repositories in the Lab. “The BRIL project aims to enhance the repository facilities at the Randall Division of Cell and Molecular Biophysics at King’s College London. This will involve:
» Embedding the repository within the researchers’ day-to-day research and experimental practices;
» Integrating the repository into the wider King’s infrastructure”.
Example of KCL “internal” repository: Mutation Testing Repository.


ADS+: Enhancing and Sustaining the Archaeology Data Service digital repository. The project aims to “Increase the sustainability of the ADS, by implementing Fedora (Flexible Extensible Digital Object Repository Architecture). This is a world-leading open source digital repository application which will allow the automation of many ADS curatorial functions, according to the Open Archival Information System (OAIS) Reference Model (ISO 14721:2003). This will help ensure the long term preservation of all ADS digital archives, as well as making the ADS archival procedures more cost-effective”.


IDMB (Institutional Data Management Blueprint) Project, U. Southampton.
The project’s aims are to provide the University of Southampton with a ten-year roadmap for delivery of a comprehensive data management infrastructure.

[IDMB Recommendations] The data management audit and gap analysis indicates where improvements can be made in the short, medium and long-term to improve data management practices and capabilities at the University. The following preliminary recommendations are put forward for short (one year), medium (one to three years), long (more than three years) term action.
[Short Term (1 year)] Crucial to supporting researchers is the consolidation of data management into a coherent framework that is easy to understand, use, and has a sustainable business model behind it. A number of major recommendations are put forward here for the short-term:
Create an institutional data repository
• Develop a scalable business model
• One-stop shop for data management advice and guidance


MaDAM: Pilot data management infrastructure for biomedical researchers at University of Manchester.
A pilot infrastructure for Biomedical Researchers at the University of Manchester, which covers data capture, data storage and data curation. This infrastructure comprises procedural support, hardware and software.
[18/03/2010] The development team have built a prototype data management front end which fits a generic set of needs amongst our Life Sciences researchers. It is aimed at being flexible enough to allow researchers themselves to assign attributes (i.e. metadata) to their experiments and datasets for them to be usefully categorised and tagged. The prototype is also entirely dispensable and intended as a catalyst for feedback from our use cases on their specific functionality requirements.


DISC-UK DataShare Project. The DISC-UK DataShare project, led by EDINA National Data Centre and the Edinburgh University Data Library, with partners at the Universities of Southampton and Oxford, has advanced the current provision of repository services for accommodating datasets in the UK.
Key conclusions: 1) Data management motivation is a better bottom-up driver for researchers than data sharing but is not sufficient to create culture change, 2) Data librarians, data managers and data scientists can help bridge communication between repository managers & researchers, 3) Institutional repositories can improve impact of sharing data over the internet.

Thursday, 3 March 2011

Repository take-up and embedding: the future of repositories


  Being already in Birmingham for the JISC Deposit Project Meeting on Mar 1st, Sonex stayed in town for attending the JISC Repositories Take-Up and Embedding Meeting as well. Start up meeting for this new JISC programme aimed to outline the future of repositories, dealing with specific issues such as (automated) deposit, shared services like RoMEO or OpenDOAR, repository integration into general software infrastructures for research information managament and promoting national (via RSP) and international (via KE, COAR and OpenAIRE) collaboration.

Six projects were presented along this programme start up meeting:

- Bringing a Buzz to NECTAR (Miggie Pickton, University of Northampton)
- Hydrangea: letting the repository flower (Richard Green, University of Hull)
- MIRAGE 2011: Repository Enrichment from Archiving to Creation (Xiaohong Gao, Middlesex University)
- Enhanced interface design for supporting take-up and embedding of the Glasgow School of Art research repository, including visual
engagement with practice led and applied outputs (Robin Burgess, Glasgow School of Art)
- eNova (Marie-Therese Gramstadt, VADS)
- EXPLORER: Embedding eXisting & Propriatary Learning in an Open-source Repository to Evolve new Resources (Alan Cope, De Montfort University)

An extra postprandial presentation on repository consolidation within a university research information management environment and the way it was done at University of Glasgow Enlighten IR was delivered by Willian Nixon. Statements like "Silos are the past, embedding repositories -through the use of tools like Sword or LDAP- is the future" made the point on how repositories should evolve in the future. According to William, repositories are to exploit new opportunities for data mining, business, intelligence, KPIs, analytics, 'stickiness' and visibility (some of these issues being thoroughly dealt with at Enlighten repository blog).

There was a remarkable presence of image-related projects among the presentations, Glasgow School of Arts, eNova and MIRAGE 2011 dealing with archiving of images into repositories one way or another. This is great news for momentum-gaining development of new information infrastructures in the area (also traceable at the JISC Deposit Programme meeting the day before), which will no doubt benefit from these projects outcomes.

After watching project presentations from a Sonex point of view, it seems they could particularly benefit from interacting with JISC Deposit projects in terms of implementing resulting strategies for automated content ingest into repositories. A handful of the take-up and embedding projects would thus be the soundest candidates for initial "customer implementation" of the various resulting methods for quick population of repositories with institutional research output (the take-up bit, prior to embedding) coming from the Deposit strand. As these projects will run
until the end of 2011 and the ones from Deposit strand should deliver around July, interaction among them could probably be easily achieved.

There was one particular project among those presented that captured Sonex's attention: MIRAGE 2011, Middlesex Medical Image Repository with a Content-Based Image Retrieval Systems Archiving Environment. MIRAGE is both an image-related repository project (as it deals with medical images) and a research data project, and it's this latter feature what gets it fully within scope of Sonex activity with regard to research data management. Ongoing data management projects (either JISC-funded or otherwise) usually deal with either numerical or textual data, but projects dealing with the deposit of graphical research data are rare (save for Data Management in Bio-Imaging - DMBI project run at The John Innes Centre, BBSRC, Norwich).

A couple of references were shared with MIRAGE project manager Dr. Xiaohong Gao, 'Feeding Neuroimaging Repositories' poster presented at OR2010 Madrid last July by a team of Universitat Autònoma de Barcelona (UAB)-Hospital de la Santa Creu i Sant Pau researchers in Barcelona, and the MIDAS/National Alliance for Medical Image Computing (NAMIC) medical image repository as to promote synergies among different projects on the same area.

The meeting presentations will shortly be available.

Wednesday, 2 March 2011

JISC Repository Deposit Programme Meeting in Birmingham


  A JISC Repository Deposit Programme meeting was held on Mar 1st, 2011 at Maple House Birmingham. Under coordination from Balviar Notay, JISC manager for the Deposit projects, presentations were delivered from representatives of the four presently running projects under JISC Deposit call: DepositMO (Steve Hitchcock, U Southampton), DURA (John Norman, UCam), RePosit (Ian Tilsed, Leeds U) and Kultivate (Marie Therese Gramstadt, VADS). Additional presentations were done for the deposit-related Open Access Repository Repository Junction (OA-RJ) project (Theo Andrew, EDINA), Sword v2 (Richard Jones - Symplectic) and Sonex (Pablo de Castro, Carlos III University Madrid) projects.


Lots of interesting issues were raised and discussed along the set of presentations, and specific teamworking activities were later carried out for promoting cooperation between projects. This was the first opportunity for representatives of all projects involved in the JISC Deposit programme to personally meet the other projects and learn about their progress and potentially complementary findings.

Several complementary visions of deposit were outlined along the workshop: a quite technical one from projects such as DepositMO and Sword, an advocacy-focused approach from RePosit project aiming to increase engagement to repository and a vision of repositories as potential suppliers of the global institutional research output required for REF purposes from DURA.

Steve Hitchcock (DepositMO, implementing Sonex usecase scenario nr 4, Deposit via personal software) delivered a few demo examples of Swordv2-assisted deposit into the DepositMO test repository via local computer file manager, including deposit of previously parsed full-text document ingesting metadata as well and achieving the metadata+object transfer. A key question on document deposit for management vs publishing purposes was also raised along DepositMO presentation: are repositories (or could they evolve into) a proper environment for document management or does the Open Access philosophy prevent them from being used as cooperative tools for example for pre-print edition by a group of authors?

DURA and RePosit projects, implementing Sonex usecase nr 2, CRIS/IR integration, are both dealing with making deposit as easy as possible for the author community by ingesting previoulsy synced inputs from Mendeley and Symplectic Elements into IRs (DURA) and specificallly “increasing engagement with repository” (RePosit) by designing a set of awareness-raising materials and campaigns later to be shared with other projects.

Kultivate, aiming to increase deposit in the arts and design environment, is both the newest and possibly the most innovative project in the strand. Repository development having been strongly focused on research papers as a main research output, work on so far underexploited creative arts materials gives Kultivate the opportunity to set new standards and provide new resources to the Open Access repository community.


Further presentations for projects providing general-purpose deposit infrastructure followed, such as EDINA Open Access Repository Junction (OA-RJ) middleware for discovery and Sword-assisted deposit. OA-RJ is already live-testing its broker for automated transfer of publisher or subject repository content inputs into specific target repositories. Richard Jones described the ongoing process for developing Sword-v2, which will deliver fine-tuned functionalities for metadata+object automated transfer to the rest of the Deposit projects and the wider repository community, resulting in higher deposit rates. Finally, a Sonex presentation stressed the need for re-examining Sonex deposit usecase scenarios for covering new types of materials such as research data, creative arts materials and learning materials. Sonex also suggested common strategy for measuring success of JISC-funded deposit projects being designed at Birmingham City University Evidence Base might include specific questions to be asked to repository managers such as whether any given automated deposit strategy was used for content ingest purposes besides specific strategies for measuring success devised by projects themselves.

The workshop presentations will shortly be available at the Deposit wiki. Once Deposit projects are completed another programme meeting will be held for sharing conclusions and examine case studies and success stories as to widely implement resulting solutions.

Sunday, 16 January 2011

"On such a full sea are we now afloat"

Such quotation -from W. Shakespeare's 'Julius Caesar'- closed Drs. Eefke Smit's talk "Taking the Current when it Serves: Research Data from the Publisher's Perspective" she delivered along 'Academic Publishing in Europe': the APE 2011 conference, held at the Berlin-Brandenburg Academy of Sciences in Berlin, Jan 11-12th, 2011.


Aiming to gather some facts for its ongoing analysis on research data management and its deposit into repositories, Sonex just attended APE2011, a meeting for the publishing industry and its environment held yearly in Berlin since 2006. The conference organisers do regularly publish a brief official report shortly after the event celebration (reports on previous APE editions
available here, report on this edition due shortly).

This particular visit to Berlin offered the chance to attend yet another event besides APE2011: the SOAP Symposium. Final report by the SOAP (Study of Open Access Publishing) project survey was presented along this one-day meeting, held on Jan 13th in the Goethe Room of the renowned Harnack-Haus in Berlin. The SOAP project describes and analyses the open access publishing landscape as well as exploring the risks and opportunities of the transition to open access publishing for libraries, publishers and funding agencies - see preliminary survey results, final report will be available as of next March.

The conference programme for APE2011, entitled "Smarter Publishing in the New Decade", included promising topics such as evolution of peer-review and ways to improve it, the so-called data deluge, business opportunities in China and how Open Access is becoming increasingly mainstream within the publishing environment. Discussions on those matters were lively both at round tables and at lunch pauses. Sonex interest being mainly on research data management, this report will subsequently focus on presentations and debates on the subject.

On Tuesday Jan 11th afternoon, a session was held on “The Data Deluge: to Drown or to Swim?”, chaired by Bob M. Campbell. Herbert Gruttenmaier, INIST-CNRS, started his presentation "Helping to Ride: a look at data sharing and access policies" by reminding that, since we were in Berlin, the definition of an Open Access Contribution on page 1 of the Berlin Declaration on Open Access to Knowledge includes “raw data and metadata”. Some highlights from his talk were:


  • There is a large number of Data Sharing Policies being defined by administrations, institutions, funding agencies and publishers themselves under the guideline "data should be made as freely and widely available as possible". See for instance NSF’s requirement for submission of data management plans of May 10th, 2010, under general policy statement “Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants”.
    Or the very recent (Jan 10th, 2011) commitment by a group of major international funders of public health research to “work together to increase the availability of data emerging from our funded research, in order to accelerate advances in public health”.

  • Publishers such as BioMed Central were featured as high-profile supporters of Open Data (see Dec 11th, 2010 post at this blog), and NPG editorial policy on dataset sharing was specifically mentioned along the talk, as well as the Brussels Declaration on STM Publishing statement that “Raw research data should be made freely available to all researchers”. Finally, discipline-based data policies such as PaN-Data Scientific data Policy Draft for Scientific Data Management Framework at European Photon and Neutron Facilities or the Joint Data Archiving Policy (JDAP) adopted in a coordinated fashion by Dryad partner journals.

  • Not everything is that simple though: the Nov 2009 "Patterns of information use and exchange: case studies of researchers in the life sciences” RIN report shows that researchers are not so eager to share their data with others, and that ‘one-size-fits-all’ information and data sharing policies may not achieve the goals there are aiming for, namely scientifically productive and cost-efficient information use in life sciences.

Drs. Eefke Smit, International Association of STM Publishers, provided a counterexample for these growing data sharing policies by publishers along her talk on "Research Data from the Publisher's Perspective" by describing the Journal of Neuroscience policy of no longer taking supplementary material from authors since Nov 1st, 2010, the procedure posing too heavy a burden on paper reviewers.
She also warned of the so-called data deluge, according to which tera- and petabite sized datasets will increase their share in research projects in upcoming years.
However, when researchers are asked where they would like to submit their research data, the answer is more often than not "publishers". This brings along the issue of research data preservation: results of an internal survey by STM Publishers show what she called “an improvable situation” with regard to preservation.

Planned talk “Data Publishing in the Context of the ICSU World Data System” by Dr. Michael Diepenbroek, Director of WDC-MARE/PANGAEA, University of Bremen, went finally off the conference programme. However, the next speaker, Dr. Jan Brasse, Managing Director of DataCite, provided some information on the progress of one of the main databases for research data in the geosciences area, by for instance stating there was “a wide cooperation between Elsevier and PANGAEA via DOI-based external links from online papers” at the former’s platforms. This kind of cooperation between publishers and international databases for handling research data might be useful for tacking the abovementioned data preservation issues.
Dr. Brasse, affiliated with the German National Library of Science and Technology Hannover, described as well the evolution of the DataCite international project as it gets carried out by local member institutions: as of Dec’10, over 1M records are already registered with DOI names at datacite.org. Perspectives for the project include setting up of a Central Metadata Base as of Jun'11; DataCite becoming a harvest point for third parties such as WoS; and cooperation via CrossRef for data-article lookup.

The data management session ended with the talk on “Managing Publication and Research Data: the eSciDoc Research Infrastructure” by Dr. Malte Dreyer from Max Planck Digital Library (MPDL). eSciDoc is as a joint project of the Max Planck Society and FIZ Karlsruhe, funded by the Federal Ministry of Education and Research (BMBF), with the aim to realize a next-generation platform for communication and publication in research organization. Further eSciDoc projects mentioned along the presentation and dealing with research data management were ‘Astronomer‘s Workbench’ (astronomy), Lifecycle Logger (biochemistry) and BW-eSci(T) for computational linguistics. DARIAH (Digital Research Infrastructure for the Arts and Humanities) –in whose development eSciDoc is directly involved- and CLARIN (Common Language Resources and Technology Infrastructure) projects were repeatedly highlighted along the session as leading EU projects on development of digital research infrastructure (including data management) for the Humanities and Social Sciences.


A joint panel discussion was then held after the presentations on research data management, with speakers taking questions from the floor. Alicia Wise, Elsevier Director of Universal Access and former archaeologist raised the issue of costs attached to research data management and who should fund them: it was agreed by the panellists that national funding bodies should assume the cost of data management. Along her question Dr. Wise incidentally mentioned that data management at the archaeological research project she used to work for succeeded only thanks to researchers dedicating 50% of their time to data curation. This aspect of dataset deposit will be examined by Sonex in order to identify alternative (automatic) curation procedures currently being used to relieve researchers of the data curation burden.

The data management issues extended well outside the session specifically devoted to them and into the Innovation session held next day, where Portland Press Adam Marshall presentation on the Semantic Biochemical Journal and Project Utopia at the Manchester School of Computer Science did extensively deal with data handling (see “Calling International Rescue: knowledge lost in literature and data landslide!” at Biochem J. (2009) 424, 317–333 for a review on “how to provide new ways of interacting with the literature, and new and more powerful tools to access and extract the knowledge sequestered within it”).

At the end of the data session panel discussion Dr. Eefke Smit synthesized the three challenges of research data management: normalization, standardization and migration. She did also remind the audience of verses following the one quoted in the title of this post:

(…) On such a full sea are we now afloat,
And we must take the current when it serves,
Or lose our ventures
.

Saturday, 18 December 2010

Sonex at the "Digital Library Research and Open Access: Interoperability Strategies" workshop

After delivering its paper "Handling repository-related interoperability issues" last Sep at the 2nd DL.org workshop in Glasgow, Sonex will be contributing a presentation at the forthcoming DL.org "Digital Library Research and Open Access: Interoperability Strategies" one-day event to be held at the British Academy in London next Feb 4th.









Sonex contribution will be part of this DL.org workshop dealing with digital libraries, Open Access repositories and interoperability among them. Already available conference programme includes presentations on DL. org reference model, DL.org policy and quality interoperability survey, degree of progress of Open Access repositories with regard to interoperability issues in the UK and Europe and research data library management among others.

Sunday, 12 December 2010

A preliminary list of discipline-specific projects on research data management

A preliminary list follows of currently running discipline-specific projects and initiatives (as of Dec 2010) dealing with research data management. The list below is not comprehensive, but a sample of ongoing projects, brought together in order to find out potential biases by area in current research data management projects. Should there be relevant projects missing, we’d appreciate a notification for including them as well.

[projects/initiatives listed in alphabetical order]

Project name: ACRID: Advanced Climate Research Infrastructure for Data
Institution/Funder/Manager: U East Anglia, STFC, Met Office, JISC
Project Description: The ACRID Project aims to develop an approach to publishing climate research data in a way that facilitates citing, re-use and the provision of full provenance information for processed data.
Area/Discipline: Climate Science


Project name: ADMIRAL
Institution/Funder/Manager: U Oxford, JISC
Project Description: A data management infrastructure for research across the life sciences
Area/Discipline: Life Sciences


Service/Project name: ADS: Archaeology Data Service
Institution/Funder/Manager: U York, AHRC, JISC, EU (mandated repository for AHRC, NERC)
Service/Project Description: The Archaeology Data Service supports research, learning and teaching with high quality and dependable digital resources. It does this by preserving digital data in the long term, and by promoting and disseminating a broad range of data in archaeology. The ADS promotes good practice in the use of digital data in archaeology, it provides technical advice to the research community, and supports the deployment of digital technologies.
ADS is actively engaged with research projects working with partners in all sectors of UK archaeology.
Area/Discipline: Archaeology


Project name: Global Argo Data Repository
Institution/Funder/Manager: NOAA, NODC (National Oceanographic Data Center), GODAE (Global Ocean Data Assimilation Experiment), IFREMER (Institute for Research and Exploitation of the Sea)
Project Description: In the year 2000, a global array of approximately 3,000 free-drifting profiling floats, known as the Argo Ocean Profiling Network, was planned as a major component of the ocean observing system. Argo originated from the need to make climate predictions on both short and long time scales and has led to international participation and collaboration to ensure global coverage.
Centers to handle the data collected by profiling floats have been established in a number of countries. These centers normally handle data from their nationally deployed floats, but sometimes provide that service to other countries or organizations. All Argo data will be publicly available in near real-time via the GTS (Global Telecommunications System) and in scientifically quality-controlled form with a few months delay.
Area/Discipline: Marine Sciences, Oceanography


Project name: BlueObelisk
Institution/Funder/Manager: Group of chemists/ programmers/informaticians
Project Description: The Blue Obelisk Data Repository lists many important chemoinformatics data such as element and isotope properties, atomic radii, etc. including references to original literature
Area/Discipline: Chemoinformatics


Project name: BRIL: Biophysical Repositories in the Lab
Institution/Funder/Manager: CeRch-KCL, JISC
Project Description: The BRIL project aims to enhance the repository facilities at the Randall Division of Cell and Molecular Biophysics at King’s College London by:
- Embedding the repository within the researchers’ day-to-day research and experimental practices
- Allowing data and metadata to be captured in automated fashion
- Allowing the structure of experimental processes as a whole to be captured, modelled and stored within the repository
- Enhancing browse and access facilities and data exchange facilities to increase interoperability.
Area/Discipline: Biophysics


Project name: CAiRO: Curating Artistic Research Output
Institution/Funder/Manager: U Bristol, DCC, JISC
Project Description: Research data created by the UK’s performance and visual arts departments is often rich, technically complex and amazingly varied in nature. This work may include interconnected multimedia records of a single live event or software which exhibits complex behaviours dependant upon the choices made by a viewer. The CAiRO project, funded as part of the wider JISC Managing Research Data programme, aims to offer data management skills tailored to the special requirements of the arts researcher-practitioner.
Area/Discipline: Creative Arts


Project name: The CEACS Data Library
Institution/Funder/Manager: CEACS Library, Center for Advanced Study in the Social Sciences (CEACS), Instituto Juan March, Madrid, Spain
Project Description: The CEACS Data Library provides support to its research community in conducting quantitative research with primary and secondary data. The Data Library has a collection of over 2,000 secondary research datasets from major data centres. The service supports research data management through a thematic website, one to one support and a Dataverse data repository to help with the management, sharing and preservation of the data produced by researchers.
Area/Discipline: Social Sciences


Project name: Data Conservancy: A New Vision for Data-Driven Science
Institution/Funder/Manager: National Science Foundation (NSF), Johns Hopkins University (Lead institution)
Project Description: The Data Conservancy (DC) embraces a shared vision: scientific data curation is a means to collect, organize, validate and preserve data so that scientists can find new ways to address the grand research challenges that face society.
Area/Discipline: Astronomy, Earth Sciences, Life Sciences and Social Sciences


Project name: DataONE
Institution/Funder/Manager: National Science Foundation (NSF)
Project Description: DataONE was conceived to ensure preservation and access to multi-scale, multi-discipline, and multi-national data about life on earth and the environment that sustains this life. It was recognized from the outset that such data are often difficult to discover, access, integrate and analyze.
Area/Discipline: Earth & Life Sciences


Project name: DataTrain
Institution/Funder/Manager: U Cambridge, ADS, DCC, JISC
Project Description: The DataTrain project aims to build on findings and tools developed in the Incremental project (JISC 07/09 funding strand), to design discipline-focused data-management training modules for post-graduate courses in Archaeology and Social Anthropology at the University of Cambridge.
Area/Discipline: Archaeology, Social Anthropology


Project name: DATUM for Health: Research data management training for health studies
Institution/Funder/Manager: Northumbria U, DCC, JISC
Project Description: This collaborative project seeks to promote research data management skills of postgraduate research students in the health studies discipline through a specially-developed training programme which focuses on qualitative, unstructured research data.
Area/Discipline: Health Sciences


Project name: DMBI: Data Management in Bio-Imaging
Institution/Funder/Manager: The John Innes Centre (BBSRC), Norwich BioScience Institutes, JISC
Project Description: DMBI aims to raise the level of data management/handling for high-throughput bio-imaging, and strengthen the interactions between image data silos, both internally and with partner organisations.
Area/Discipline: Biology/Bio-imaging


Project name: DMP-ESRC: Data management planning for ESRC research data-rich investments
Institution/Funder/Manager: UK Data Archive (UKDA), Economic and Social Research Council (ESRC), Joint Information Systems Committee (JISC)
Project Description: Data Management Planning (DMP) project aims to increase the data management and sharing capability within the social sciences community.
Area/Discipline: Social Sciences


Project name: DMTpsych: Data Management Training for psychologists
Institution/Funder/Manager: U York, U Sheffield, Sheffield Hallam U, DCC, JISC
Project Description: The aim of DMTpsych is to build capacity and skills within psychology postgraduates relating to research data management. The project builds upon existing research data management materials developed by the Digital Curation Centre (DCC) to create discipline-focused postgraduate training materials that can be embedded into postgraduate research training for the psychological sciences.
Area/Discipline: Psychology


Project name: DRYAD UK
Institution/Funder/Manager: British Library, University of Oxford, JISC
Project Description: Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences as published by a Consortium of Journals. Dryad UK aims to expand Dryad into the UK by establishing a UK mirror site and extending service to new publishers and disciplines.
Area/Discipline: Biomedical Sciences


Project name: EDgrid Central: Data Repository System for 3-D Full-Scale Earthquake Testing Facility
Institution/Funder/Manager: National Institute for Advanced Industrial Science and Technology, Japan
Project Description: A data repository system called EDgrid Central is designed for storing huge amount of experiment data by using a 3-D full-scale earthquake testing facility. The EDgrid Central prepares large storage capacity and implements a data modeling for the shake test in the backend. The frontend is a portal for users to retrieve the stored data by meta-data search and bulk download. This system uses the NEEScentral developed by the NEES project in the United States by enhancing search and download functionalities, according to the EDgrid users' requirements. The EDgrid Central allows facility sites to have a permanent repository of the shaking table experiment and it also enables civil engineering researchers to share their data and reports in their daily activities.
Area/Discipline: Geophysics


Project name: EIDCSR: Embedding Institutional Data Curation Services in Research
Institution/Funder/Manager: U Oxford, JISC
Project Description: The Embedding Institutional Data Curation Services in Research (EIDCSR) project aims to address the data management and curation requirements of three collaborating research groups in Oxford, by scoping their requirements and embedding selected elements of the digital curation lifecycle, including policy, workflow, and sustainability solutions within the research process. The workflows generated by the project are intended to scale to include other research domains and the outputs should be of use to other research intensive institutions. Project runs until Dec'10.
Area/Discipline: Medical & Life Sciences


Project name: ERIM: Engineering Research Information Management
Institution/Funder/Manager: U Bath, UKOLN, JISC
Project Description: ERIM aims to specify in practical terms how effective data management can be enabled and supported in research projects, particular to support reuse or more broadly what can be thought of as 're-purposing'. The project will look primarily at the engineering research domain.
Area/Discipline: Engineering


Project name: EURO VO: European Virtual Observatory
Institution/Funder/Manager: CNRS, ESO, INAF, U Edinburgh
Project Description: The Virtual Observatory (VO) is an international astronomical community-based initiative. It aims to allow global electronic access to the available astronomical data archives of space and ground-based observatories and other sky survey databases. It also aims to enable data analysis techniques through a coordinating entity that will provide common standards, wide-network bandwidth, and state-of-the-art analysis tools. The EURO-VO project aims at deploying an operational VO in Europe. Its objectives are the support of the utilization of the VO tools and services by the scientific community, the technology take-up and VO compliant resource provision and the building of the technical infrastructure.
Area/Discipline: Astronomy


Project name: FISHnet
Institution/Funder/Manager: Centre for e-Research, King’s College London, JISC
Project Description: Freshwater information sharing network
Area/Discipline: Freshwater Biology


Project name: HALOGEN - History Archaeology Linguistics Onomastics and GENetics
Institution/Funder/Manager: U Leicester, JISC
Project Description: The cross-disciplinary Roots of the British collaboration between scholars in humanities and genetics seeks to interrogate the evidence for the migration and/or continuity of human populations in the British Isles in the distant past. The HALOGEN project will support the data management needs of the researchers involved and thus establish organisational best practice in terms of data management planning and the support of diverse cross-disciplinary research data.
Area/Discipline: Ancient history/Genetics


Project name: I2S2
Institution/Funder/Manager: UKOLN/DCC/Soton/STFC, JISC
Project Description: Infrastructure for integration in structural sciences
Area/Discipline: Chemistry (with a view towards inter-disciplinary application)


Project name: Incremental: A step by step approach to informing, improving, & increasing research data curation practice
Institution/Funder/Manager: Cambridge University Library, Humanities Advanced Technology and Information Institute (HATII) at U Glasgow, DCC, JISC
Project Description: The aim of Incremental is to inform, improve and increase research data curation within UK HEIs, by providing exemplars and resources for others to use. Specific objectives are: (1) to investigate current practices and requirements at each institution; (2) to develop a plan for addressing these requirements; (3) to pilot tools and services at each HEI and then make further adjustments and recommendations; (4) embed the work within each institution; and (5) to deliver resources and findings to the DCC, DPC and JISC for wider dissemination. In addition to resources, the project will seek to provide information about their cost and sustainability.
Area/Discipline: Archaeology, Chemistry, English, Engineering and Medicine


Project name: IODP: Integrated Ocean Drilling Program
Institution/Funder/Manager: National Science Foundation (NSF), Japan’s Ministry of Education, Culture, Sports, Science and Technology (MEXT)
Project Description: IODP is an international marine research program that explores Earth's history and structure recorded in seafloor sediments and rocks, and monitors subseafloor environments. IODP builds upon the earlier successes of the Deep Sea Drilling Project (DSDP) and Ocean Drilling Program (ODP), which revolutionized our view of Earth history and global processes through ocean basin exploration.
The IODP oversees repositories around the world. Samples are distributed according to ODP and IODP policies.
Area/Discipline: Marine Sciences


Project name: MaDaM
Institution/Funder/Manager: Manchester eResearch Centre, JISC
Project Description: Pilot data management infrastructure for biomedical researchers
Area/Discipline: Biomedical Sciences


Project name: Managing Research Data: Gravitational Waves (MRD-GW)
Institution/Funder/Manager: STFC, University of Glasgow, JISC
Project Description: MRD-GW aims to examine the way in which Big Science data is managed, and produce recommendations as appropriate. Gravitational Wave (GW) data generated by the LIGO Scientific Consortium (LSC) will be used as a case-study.
Area/Discipline: Particle physics/Astronomy


Project name: PANGAEA
Institution/Funder/Manager: Alfred Wegener Institute for Polar and Marine Research (AWI), DFG
Project Description: Publishing Network for Geoscientific & Environmental Data
Area/Discipline: Earth Sciences


Project name: PEG-BOARD
Institution/Funder/Manager: School of Geographical Sciences, University of Bristol, JISC
Project Description: Palaeoclimate and environment data generation - building open access to research data
Area/Discipline: Palaeoclimatology


Project name: Quixote
Institution/Funder/Manager: U Cambridge/CSIC
Project Description: The main objective/vision of the Quixote project is to design, test and deploy a modular, open source system of tools that allow computational chemistry data (now sitting in the darkness of individual hard-disks) to be organized, shared, and queried
Area/Discipline: Quantum Chemistry


Project name: Research Data MANTRA
Institution/Funder/Manager: U Edinburgh/JISC
Project Description: Aims to develop open, online learning materials which reflect best practice in research data management grounded in three disciplinary contexts: social science, clinical psychology, and geoscience. The resulting materials will be embedded in three participating postgraduate programmes and made available through the Transkills programme for use by all postgraduate and early career researchers as well as made available generally through an open license. In addition to web-based 'chapters' that students can work through at their own pace, the course will include video interviews with leading academics about data management challenges, and practical exercises in handling data in four software analysis environments: SPSS, NVivo, R and ArcGIS.
Area/Discipline: Social and political science, Geoscience, Clinical psychology


Project name: SageCite: Citing network models of disease and associated data
Institution/Funder/Manager: UKOLN, U Manchester, British Library, JISC
Project Description: SageCite will develop and test a Citation Framework linking data, methods and publications. The domain of bio-informatics provides a case study, and the project builds on existing infrastructure and tools. Citations of complex network models of disease and associated data will be embedded in leading publications, exploring issues around the citation of data including the compound nature of datasets, description standards and identifiers.
Area/Discipline: Bioinformatics


Project name: ShareGeo Open
Institution/Funder/Manager: EDINA, JISC
Project Description: ShareGeo Open is a spatial data repository that promotes data sharing between creators and users of geospatial data
Area/Discipline: Geography


Project name: SPQR: supporting productive queries for research
Institution/Funder/Manager: KCL, U Edinburgh, Humboldt U Berlin, JISC
Project Description: The overall aim is to investigate the potential of linked data for integrating datasets related to classical antiquity, in particular addressing the particular challenges raised by our material – its incompleteness, uncertainty and fuzziness. We will achieve this by developing mechanisms for breaking data out of silos and exposing it as linked data, using standard ontologies, and in particular the Europeana Data Model, as the semantic “glue” for linking data into a wider network of knowledge. The ultimate objective will be to create a common corpus or “RDF warehouse” of linked Classics data that can be explored, searched and enhanced by further annotations.
Area/Discipline: Classics, Epigraphy and Archaeology


Project name: SUDAMIH
Institution/Funder/Manager: University of Oxford, JISC
Project Description: Supporting data management infrastructure for the Humanities
Area/Discipline: Humanities


Project name: TARDIS
Institution/Funder/Manager: Monash University, Australian National Data Service (ANDS), University of Sidney and some other Australian institutions
Project Description: TARDIS is a multi-institutional collaborative venture that aims to facilitate the archiving and sharing of raw X-ray diffraction images (collectively known as a 'dataset') from the protein crystallography community.
Area/Discipline: Crystallography


Project name: VAMDC Project: Virtual Atomic and Molecular Data Centre
Institution/Funder/Manager: EU, CNRS, CMSUC, UCL, OU, UNIVIE, UU, KOLN, INAF, QUB, AOB, ISRAN, RFNC-VNIITF, IAO, IVIC, INASAN
Project Description: VAMDC aims at building an interoperable e-Infrastructure for the exchange of atomic and molecular data. It embraces on the one hand scientists from a wide spectrum of disciplines in atomic and molecular (AM) Physics with a strong coupling to the users of their AM data (astrochemistry, atmospheric physics, plasmas) and on the other hand scientists and engineers from the ICT community used to deal with deploying interoperable e-infrastructure.
Area/Discipline: Astrophysics


Project name: WissGrid: Grid for Science
Institution/Funder/Manager: DFG, U Göttingen, Astrophysikalisches Institut (AIP), Alfred-Wegener-Institut (AWI), Deutsches Elektronen Synchrotron (DESY), Deutsches Klimarechenzentrum GmbH (DKRZ), Konrad-Zuse-Zentrum für Informationstechnik (ZIB), Universitätsmedizin Göttingen (UMG), Niedersächsische Staats- und Universitätsbibliothek (SUB), Technische U Dortmund (UDO), U Heidelberg, U Trier, U Wuppertal
Project Description: WissGrid’s objective is to establish long-term organisational and technical D-Grid structures for the academic world. WissGrid combines the heterogeneous needs from a variety of scientific disciplines and develops concepts for the long-term sustainable use of the organisational and technical grid infrastructure. In this context, the project aims to strengthen the organisational cooperation of scientists in the grid and to lower the entry barriers for new community grids.
Area/Discipline: Astrophysics, High Energy Physics, Climate Research, Medicine


Project name: XYZ Project
Institution/Funder/Manager: U Cambridge/IUCr/BioMed Central/Open Knowledge Foundation, JISC
Project Description: The XYZ Project will create a demonstrator of a new workflow for publishing data in support of full-text. The author prepares data for publication (if possible with validation) in a third-party trusted repository before the paper is submitted to a publisher. Our software will manage the deposition, release to reviewers, dis-embargo and for conventional publication or as a data journal
Area/Discipline: Crystallography


Besides this preliminary set of discipline-specific research data-related running projects -to be shortly enriched by Sonex with a complementary list of general purpose projects dealing with research data management- a thorough list of open data repositories for all areas may be found at the data repository section of the Open Access Directory (OAD).