First objective of the JISC-supported Sonex initiative was to identify and analyse deposit opportunities (use cases) for ingest of research papers (and potentially other scholarly work) into repositories. Later on, the project scope widened to include identification and dissemination of various projects being developed at institutions in relation to the deposit usecases previously analyzed. Finally, Sonex was recently asked to extend its analysis of deposit opportunities to research data.

Monday, 29 August 2011

Research data management in crystallography at the XXII IUCr Congress

  On Aug 29th a session on research data management will be held at the XXII Congress of the International Union of Crystallography (IUCr2011). The session will feature talks by Brian McMahon (IuCr), Brian Matthews (I2S2 Project), Peter Murray-Rust (CrystalEye), John Westbrook (wwPDB) and Nick Spadaccini (DDLm). Peter Murray-Rust will deliver a talk along the session on Open Crystallography.

Friday, 26 August 2011

STM research data management and the Quixote Project

  A one-day seminar was held yesterday Thu Aug 25th at the Zaragoza Scientific Center for Advanced Modeling (ZCAM) on research data management and the Quixote Project for data management in Computational Chemistry. The session, entitled “Research data management: The experience of the Quixote project for Quantum Chemistry data. Can it be extended into a collection of research data management repositories?”, was attended by a rather diverse group of researchers (both computational chemists and from other disciplines) and repository managers, aiming to learn about research data management initiatives and specifically about the progress of the Quixote Project, in which two researchers from the University of Zaragoza and the CSIC Institute of Physical Chemistry "Rocasolano" are involved.

The Quixote Project (see paper "The Quixote project: Collaborative and Open Quantum Chemistry data management in the Internet age", in press with the J Chem Inf) is developing the infrastructure required to convert output from a number of different molecular quantum chemistry (QC) packages -such as NWChem or Gaussian- to a common semantically rich, machine-readable format and to build repositories of QC data results.

The session started with an introduction to "STM Research data management initiatives in Spain and abroad" delivered by SONEX member Pablo de Castro, in which different national approaches to RDM were presented based mainly on the information collected at the JISC MRD Programme International Workshop held last March in Birmingham.

Different approaches to data management taken from the JISC and SURF Foundation were discussed at Q&A time: for the JISC, datasets are assets per se, regardless of where they are attached to a research paper as supplementary material, whereas the 'Enhanced publication' approach from the SURF Foundation in the Netherlands, regards datasets mainly as digital objects connected to research publications. Some emphasis was made on the fact that the upcoming OpenAIREPlus European project shares the SURF approach.

Two presentations on the Quixote Project followed, "From Databases in QC 2010, ZCAM, Sep 2010 onwards: a brief history of Quixote" by Jorge Estrada and "The Quixote Project: a pioneering work in managing Computational Chemistry research data" by Pablo Echenique. Both Quixote project members explained the results, the challenges and the cooperation opportunities of this non-specifically-funded RDM project, engaging in a fruitful dialogue with the attending researchers and repository managers on how the QC data assets could be best managed.

Finally Peter Murray-Rust closed the morning interventions with some reflections on the subject "Entering a new era in data management" - see his blogpost for a summary of his ideas.

In the afternoon there were joint debates on how to improve implementation of research data management initiatives. Researcher motivation for dataset sharing was extensively debated: this motivation should ideally not just arise from a given funding agency actually requiring those data to be made available, but from the sheer advantages (as summarized by Peter Murray-Rust) that doing so would bring to the research practice and communication ("improving methodology").

An independent debate session was held for discussing how to start developing some kind of research data management infrastructure in those countries where work in this area is presently beginning. These are some recommendations that were put together by the participants in the debate:

- Some workgroup of (not just library-based) IT professionals should be put together for analysing the current infrastructure and the opportunities for launching new initiatives upon potentially reusable pre-existing ones,

- It would be advisable to analyze the researcher behaviour and needs in terms of storing their datasets into international platforms for data sharing (in case they are available for their specific discipline),

- It would be interesting to examine the motivation for data sharing from research groups in different research areas, so that initial efforts to develop data management infrastructures can start working with those areas more willing to share their data (Earth Sciences recurrently showing up when analysing the international perspective),

- Pioneering initiatives for providing services to STM researchers regarding data handling and storing from given Institutional Repositories (such as eSpacio UNED and Digital.CSIC) should be highlighted as a role model to be spread,

- The OpenAIREPlus/SURF Enhanced papers approach could be a good starting point for Institutional Repositories to work at, by finding out which of their presently filed papers have supplementary data attached at the journal site and trying to independently manage those ones,

- A need was detected along the session talks with researchers for a dataset management system at research centres for basic internal organisation purposes. Datasets filed in this internal storage system may or may not be aimed for publication,

- Production and publication of potentially citable datasets should be acknowledged as a relevant scientific contribution for research assessment purposes,

- There are big differences in needs, procedures and required infrastructure regarding data management between Big Science and long-tail science (the greater part actually being groups of three researchers in a lab with specific needs of their own),

- The Library is a potential supplier of know-how on data processing and storing for researchers, and that role should be promoted within the institutions,

- The Spanish e-Research National Network, mostly dealing with Grid and supercomputing initiatives, might be a good workgroup infrastructure for pioneering data management initiatives in Spain,

- There are real collaboration opportunities between the Quixote Project and the research information management infrastruture at the University of Zaragoza (two IRs being currently available, Zaguan at the University and Digital.CSIC at the Spanish Nacional Research Council, CSIC),

- Research staff (mainly PhD students) getting involved in the management and operation of the dataset information management systems (such as Chempound data repository at the University of Cambridge) seems a prerequisite for the success of the data management initiatives

- Due to the specific data features for various research areas, the incipient data management infrastructure available is more developed for the Social Sciences and Humanities than for STM research areas.

Saturday, 20 August 2011

Repositories and CRIS: Working Smartly Together

  Due to recent involvement in other OA repository-related activities at the University of Khartoum, reports at this blog on recent events such as the 'Repositories and CRIS: Working Smartly Together' workshop organised by RSP last Jul 19th in Nottingham and the 4th edition of the Repository Fringe in Edinburgh were slightly delayed. Good news about it is that interesting reports on these events have been published in the meantime (see the RSP event review by Gareth J. Johnson at UKCoRR blog). This will allow Sonex to take a different approach to the reporting, making it more of a reflection than of a description, as well as covering the conference followup.

One of the subjects discussed along the Reposit project session within the Conference at EMCC was what mailing list or discussion group should replace the forum for discussing IR and CRIS-related issues once the RePosit project comes to an end. Several options were considered, from using already existing lists such as UKCoRR's or ARMA's, to creating a new Super-CRIS list at JISC mail such as Steps are being taken after the workshop to make this new list available.

The REF is working as a very strong driver towards CRIS implementation (with CERIF format being extensively considered in order to become a standard, see Marc Cox's presentation). A good number of HEIs do now operate a CRIS as a result (either commercial, in-house built or an extension of their EPrints repository). That is the good news. The not so good ones may be the fact that due to CRIS systems offering an enhanced collection of features, RIM infrastructure managers are starting to wonder whether an Open Access repository (usually managed by the Library) isn't becoming a somehow redundant piece of software, with most of its functionalities being increasingly covered by the CRIS (managed at the Research Offices). Repository phase-out is thus beginning to be discussed at given institutions for integration and optimization purposes. However, as Janet Aucock (University of St. Andrews) writes in the reposit@googlegroups list, even if the degree of overlap between repositories and CRIS systems may be large and growing, there are still features a CRIS will not be able to deliver:

"(...) Another point is to do your homework really well and make absolutely sure that the CRIs can deliver everything that a repository can do. Can it provide established permanent identifiers for items? Can it handle embargoes effectively? What about stats? Does the discovery interface in the portal display all the metadata that you need with regard to open access full text eg rights statements etc. These are small details which we take for granted but are not always embedded into the CRIS. CRIS software is still evolving too, and perhaps not all the functionality necessary is there yet. Another aspect of this is the question of the interfaces for users and discovery. Is the CRIS successfully harvested or crawled by search engines. Is it ranked appropriately. Can it expose metadata appropriately to other services where required? Can it isolate metadata with full text attached/open access full text attached and allow that set to be harvested and reused? We know that our own CRIS supplier is still working on adding all the "repository" functionality that they think is needed for their product. But at the moment I don't know the fine detail of this".

Besides R4R/CERIF4REF Project at KCL mentioned by Marc Cox, other projects also dealing with CERIF implementation regarding CRISes were mentioned such as MICE for Measuring Impact under CERIF, or the BRUCE Project (Brunel Research Under a CERIF Environment) that was presented at the 2011 euroCRIS meeting in Bologna last May (see Sonex post on the two recent euroCRIS meetings in Italy).

Another interesting outcome of this RSP event was the opportunity to learn from local SHERPA RoMEO team about the RoMEO API new v2.8 version and the release of the SHERPA RoMEO Publisher's Policy Tool, that will allow publishers to directly define their RoMEO policies via an embedded portal in SHERPA (actually presented next day, Jul 20th, at the 'RoMEO for Publishers' event in London).

Finally, a poster was featured in the event poster section called “SICA: A CRIS with an embedded Repository working for the innovation in Andalusia Region (Spain)”. With this integrated system for recording scientific production of the researchers belonging to nine universities, research organizations, technology centres and other scientific institutions of the Andalusia region in Spain, the National & Regional CRIS/IR integration initiatives (as recorded by Sonex in its May'2010 post) keep growing. This particular CRIS initiative is being developed within the European SISOB Project on -yet again- how to measure the impact of science in society.

Besides this -not thorough nor systematically updated- Sonex list of National & Regional CRIS/IR integration initiatives, a comprehensive list of 'CRIS + Repositories in the UK' is being put together as a Conference followup. When complete (it's open for any missing one to be filled in) the list will join the RSP Wiki where Institutional Repositories in the UK are already listed as to provide a clear picture of existing infrastructure.