First objective of the JISC-supported Sonex initiative was to identify and analyse deposit opportunities (use cases) for ingest of research papers (and potentially other scholarly work) into repositories. Later on, the project scope widened to include identification and dissemination of various projects being developed at institutions in relation to the deposit usecases previously analyzed. Finally, Sonex was recently asked to extend its analysis of deposit opportunities to research data.






Sunday 13 March 2011

Strategies for research data deposit in ongoing data management projects


  Prior to start performing pattern analysis for research data deposit into (institutional or subject-based) data repositories –whether or not open access– first step by Sonex is to scope ongoing projects dealing with that kind of deposit, as well as already closed projects which supplied relevant guidelines on the subject. A list of projects working on data management follows, with their specific approach on how to deal with actual data deposit as taken from project blogs:


TARDIS (Monash University–Australian National Data Service).
“There is a pressing need for the archival and curation of raw X-ray diffraction data. However, the relatively large size of these datasets has presented challenges for storage in a single worldwide repository. This problem can be avoided by using a federated approach, where each institution or university utilizes its institutional repository”.


ADMIRAL: A JISC-funded data management infrastructure for research across the life sciences.
"The purpose of the ADMIRAL Project is to create a two-tier federated data management infrastructure for use by life science researchers, that will provide services (a) to meet their local data management needs for the collection, digital organization, metadata annotation and controlled sharing of biological datasets; and (b) to provide an easy and secure route for archiving annotated datasets to an institutional repository, The Oxford University Data Store, for long-term preservation and access, complete with assigned Digital Object Identifiers and Creative Commons open access licences".
(See Oxford University Library Services' Databank)


XYZ Project. “The XYZ Project will create a demonstrator of a new workflow for publishing data in support of full-text. The author prepares data for publication (if possible with validation) in a third-party trusted repository before the paper is submitted to a publisher. Our software will manage the deposition, release to reviewers, dis-embargo and for conventional publication or as a data journal. Two Open Access publishers (International Union of Crystallography and BioMed Central) are engaged with the project and will test the new workflow”.
Anticipated Outputs and Outcomes: A demonstrator repository hosted by the IUCr.


FISHnet: Freshwater information sharing network. “This project will allow researchers in multiple academic, governmental and voluntary-sector institutions to share their data. Data will be held securely in a sustainable subject repository which preserves and disseminates multiple datasets as part of the FreshwaterLife.org information portal. Data creators will be able to manage access rights to their content, from Open Access to sharing with trusted colleagues”.


DMBI: Data Management in Bio-Imaging. “The quantity of data generated by modern high-throughput bio-imaging systems presents a significant challenge in both data management and processing. Furthermore, there is no explicit system/way to record the processing algorithms and parameters that are used to produce results. Thus there is no strong link between images, software and results. This projects aims to address these issues”.
Anticipated Outputs and Outcomes: Build a prototype DMBI system around OMERO.


CaiRO: Curating Artistic Research Output. “No prominent subject-based repository exists to act as the custodians of arts practice-as-research data. Where institution provision for data management is in place (for instance, an institutional repository service) the arts researcher-practitioner cannot always rely on an understanding of the special nature of arts research data. More commonly, data is retained in departmental collections, built and maintained by small teams which often include researchers themselves”.


BRIL: Biophysical Repositories in the Lab. “The BRIL project aims to enhance the repository facilities at the Randall Division of Cell and Molecular Biophysics at King’s College London. This will involve:
» Embedding the repository within the researchers’ day-to-day research and experimental practices;
» Integrating the repository into the wider King’s infrastructure”.
Example of KCL “internal” repository: Mutation Testing Repository.


ADS+: Enhancing and Sustaining the Archaeology Data Service digital repository. The project aims to “Increase the sustainability of the ADS, by implementing Fedora (Flexible Extensible Digital Object Repository Architecture). This is a world-leading open source digital repository application which will allow the automation of many ADS curatorial functions, according to the Open Archival Information System (OAIS) Reference Model (ISO 14721:2003). This will help ensure the long term preservation of all ADS digital archives, as well as making the ADS archival procedures more cost-effective”.


IDMB (Institutional Data Management Blueprint) Project, U. Southampton.
The project’s aims are to provide the University of Southampton with a ten-year roadmap for delivery of a comprehensive data management infrastructure.

[IDMB Recommendations] The data management audit and gap analysis indicates where improvements can be made in the short, medium and long-term to improve data management practices and capabilities at the University. The following preliminary recommendations are put forward for short (one year), medium (one to three years), long (more than three years) term action.
[Short Term (1 year)] Crucial to supporting researchers is the consolidation of data management into a coherent framework that is easy to understand, use, and has a sustainable business model behind it. A number of major recommendations are put forward here for the short-term:
Create an institutional data repository
• Develop a scalable business model
• One-stop shop for data management advice and guidance


MaDAM: Pilot data management infrastructure for biomedical researchers at University of Manchester.
A pilot infrastructure for Biomedical Researchers at the University of Manchester, which covers data capture, data storage and data curation. This infrastructure comprises procedural support, hardware and software.
[18/03/2010] The development team have built a prototype data management front end which fits a generic set of needs amongst our Life Sciences researchers. It is aimed at being flexible enough to allow researchers themselves to assign attributes (i.e. metadata) to their experiments and datasets for them to be usefully categorised and tagged. The prototype is also entirely dispensable and intended as a catalyst for feedback from our use cases on their specific functionality requirements.


DISC-UK DataShare Project. The DISC-UK DataShare project, led by EDINA National Data Centre and the Edinburgh University Data Library, with partners at the Universities of Southampton and Oxford, has advanced the current provision of repository services for accommodating datasets in the UK.
Key conclusions: 1) Data management motivation is a better bottom-up driver for researchers than data sharing but is not sufficient to create culture change, 2) Data librarians, data managers and data scientists can help bridge communication between repository managers & researchers, 3) Institutional repositories can improve impact of sharing data over the internet.

Thursday 3 March 2011

Repository take-up and embedding: the future of repositories


  Being already in Birmingham for the JISC Deposit Project Meeting on Mar 1st, Sonex stayed in town for attending the JISC Repositories Take-Up and Embedding Meeting as well. Start up meeting for this new JISC programme aimed to outline the future of repositories, dealing with specific issues such as (automated) deposit, shared services like RoMEO or OpenDOAR, repository integration into general software infrastructures for research information managament and promoting national (via RSP) and international (via KE, COAR and OpenAIRE) collaboration.

Six projects were presented along this programme start up meeting:

- Bringing a Buzz to NECTAR (Miggie Pickton, University of Northampton)
- Hydrangea: letting the repository flower (Richard Green, University of Hull)
- MIRAGE 2011: Repository Enrichment from Archiving to Creation (Xiaohong Gao, Middlesex University)
- Enhanced interface design for supporting take-up and embedding of the Glasgow School of Art research repository, including visual
engagement with practice led and applied outputs (Robin Burgess, Glasgow School of Art)
- eNova (Marie-Therese Gramstadt, VADS)
- EXPLORER: Embedding eXisting & Propriatary Learning in an Open-source Repository to Evolve new Resources (Alan Cope, De Montfort University)

An extra postprandial presentation on repository consolidation within a university research information management environment and the way it was done at University of Glasgow Enlighten IR was delivered by Willian Nixon. Statements like "Silos are the past, embedding repositories -through the use of tools like Sword or LDAP- is the future" made the point on how repositories should evolve in the future. According to William, repositories are to exploit new opportunities for data mining, business, intelligence, KPIs, analytics, 'stickiness' and visibility (some of these issues being thoroughly dealt with at Enlighten repository blog).

There was a remarkable presence of image-related projects among the presentations, Glasgow School of Arts, eNova and MIRAGE 2011 dealing with archiving of images into repositories one way or another. This is great news for momentum-gaining development of new information infrastructures in the area (also traceable at the JISC Deposit Programme meeting the day before), which will no doubt benefit from these projects outcomes.

After watching project presentations from a Sonex point of view, it seems they could particularly benefit from interacting with JISC Deposit projects in terms of implementing resulting strategies for automated content ingest into repositories. A handful of the take-up and embedding projects would thus be the soundest candidates for initial "customer implementation" of the various resulting methods for quick population of repositories with institutional research output (the take-up bit, prior to embedding) coming from the Deposit strand. As these projects will run
until the end of 2011 and the ones from Deposit strand should deliver around July, interaction among them could probably be easily achieved.

There was one particular project among those presented that captured Sonex's attention: MIRAGE 2011, Middlesex Medical Image Repository with a Content-Based Image Retrieval Systems Archiving Environment. MIRAGE is both an image-related repository project (as it deals with medical images) and a research data project, and it's this latter feature what gets it fully within scope of Sonex activity with regard to research data management. Ongoing data management projects (either JISC-funded or otherwise) usually deal with either numerical or textual data, but projects dealing with the deposit of graphical research data are rare (save for Data Management in Bio-Imaging - DMBI project run at The John Innes Centre, BBSRC, Norwich).

A couple of references were shared with MIRAGE project manager Dr. Xiaohong Gao, 'Feeding Neuroimaging Repositories' poster presented at OR2010 Madrid last July by a team of Universitat Autònoma de Barcelona (UAB)-Hospital de la Santa Creu i Sant Pau researchers in Barcelona, and the MIDAS/National Alliance for Medical Image Computing (NAMIC) medical image repository as to promote synergies among different projects on the same area.

The meeting presentations will shortly be available.

Wednesday 2 March 2011

JISC Repository Deposit Programme Meeting in Birmingham


  A JISC Repository Deposit Programme meeting was held on Mar 1st, 2011 at Maple House Birmingham. Under coordination from Balviar Notay, JISC manager for the Deposit projects, presentations were delivered from representatives of the four presently running projects under JISC Deposit call: DepositMO (Steve Hitchcock, U Southampton), DURA (John Norman, UCam), RePosit (Ian Tilsed, Leeds U) and Kultivate (Marie Therese Gramstadt, VADS). Additional presentations were done for the deposit-related Open Access Repository Repository Junction (OA-RJ) project (Theo Andrew, EDINA), Sword v2 (Richard Jones - Symplectic) and Sonex (Pablo de Castro, Carlos III University Madrid) projects.


Lots of interesting issues were raised and discussed along the set of presentations, and specific teamworking activities were later carried out for promoting cooperation between projects. This was the first opportunity for representatives of all projects involved in the JISC Deposit programme to personally meet the other projects and learn about their progress and potentially complementary findings.

Several complementary visions of deposit were outlined along the workshop: a quite technical one from projects such as DepositMO and Sword, an advocacy-focused approach from RePosit project aiming to increase engagement to repository and a vision of repositories as potential suppliers of the global institutional research output required for REF purposes from DURA.

Steve Hitchcock (DepositMO, implementing Sonex usecase scenario nr 4, Deposit via personal software) delivered a few demo examples of Swordv2-assisted deposit into the DepositMO test repository via local computer file manager, including deposit of previously parsed full-text document ingesting metadata as well and achieving the metadata+object transfer. A key question on document deposit for management vs publishing purposes was also raised along DepositMO presentation: are repositories (or could they evolve into) a proper environment for document management or does the Open Access philosophy prevent them from being used as cooperative tools for example for pre-print edition by a group of authors?

DURA and RePosit projects, implementing Sonex usecase nr 2, CRIS/IR integration, are both dealing with making deposit as easy as possible for the author community by ingesting previoulsy synced inputs from Mendeley and Symplectic Elements into IRs (DURA) and specificallly “increasing engagement with repository” (RePosit) by designing a set of awareness-raising materials and campaigns later to be shared with other projects.

Kultivate, aiming to increase deposit in the arts and design environment, is both the newest and possibly the most innovative project in the strand. Repository development having been strongly focused on research papers as a main research output, work on so far underexploited creative arts materials gives Kultivate the opportunity to set new standards and provide new resources to the Open Access repository community.


Further presentations for projects providing general-purpose deposit infrastructure followed, such as EDINA Open Access Repository Junction (OA-RJ) middleware for discovery and Sword-assisted deposit. OA-RJ is already live-testing its broker for automated transfer of publisher or subject repository content inputs into specific target repositories. Richard Jones described the ongoing process for developing Sword-v2, which will deliver fine-tuned functionalities for metadata+object automated transfer to the rest of the Deposit projects and the wider repository community, resulting in higher deposit rates. Finally, a Sonex presentation stressed the need for re-examining Sonex deposit usecase scenarios for covering new types of materials such as research data, creative arts materials and learning materials. Sonex also suggested common strategy for measuring success of JISC-funded deposit projects being designed at Birmingham City University Evidence Base might include specific questions to be asked to repository managers such as whether any given automated deposit strategy was used for content ingest purposes besides specific strategies for measuring success devised by projects themselves.

The workshop presentations will shortly be available at the Deposit wiki. Once Deposit projects are completed another programme meeting will be held for sharing conclusions and examine case studies and success stories as to widely implement resulting solutions.