First objective of the JISC-supported Sonex initiative was to identify and analyse deposit opportunities (use cases) for ingest of research papers (and potentially other scholarly work) into repositories. Later on, the project scope widened to include identification and dissemination of various projects being developed at institutions in relation to the deposit usecases previously analyzed. Finally, Sonex was recently asked to extend its analysis of deposit opportunities to research data.






Friday 13 April 2012

Democratizing research data management


"A bit of competition would certainly do no harm to institutionally-driven RDM projects"
(JISC MRD Project representative along a conversation on LabArchives)

  A press release was published last week by BioMed Central announcing its partnership with LabArchives in order to provide access to an enhanced version of this Electronic Lab Notebook to all BMC journal authors. This enhanced version of LabArchives with a default 100 MB storage will allow researchers to assign DOIs to every dataset submitted as supplementary material to any BMC title. Labarchives is howewer not specifically aimed for supplementary data management: on the one hand the platform has a publisher-oriented side for supplementary dataset submission; on the other hand however, LabArchives could also be used as a standard tool for general-purpose research data management. This feature offers researchers the opportunity to use a RDM tool regardless of their institutional affiliation, scientific discipline or country they are working in.


Since the press release was published on Apr 4th, ie just before the Easter holidays arrived, there hasn't been much of a discussion (yet) on its potential implications to research data management. Howewer, this commercial software may provide an additional means to do RDM to all research groups in the UK currently not covered by a JISC MRD project or a specific institutional data policy. Besides this, in those countries where no particular emphasis is being made on the need for RDM initiatives, this tool might mean a very useful way to promote RDM directly among researchers removing the need for institutional data policies, funder mandates on data deposit and even support from data librarians. If we consider the double bottleneck currently preventing RDM activities to succeed in many countries -a top-down one created by the lack of official committement to RDM and a bottom-up one at understaffed institutional repository management teams- this BMC-LabArchives partnership could mean something close to a revolution in research data management if properly disseminated to authors, research groups and institutions.

There are of course other RDM platforms around as of today, such as figshare, Dryad or the growing data repository network, but LabArchives offering researchers the opportunity to publish the data they decide to share (including DOI assignment), a new way has in fact been opened for performing RDM at big and small HEIs. 100 MB -or even the 100 GB storage offered by the LabArchives susbcription-based professional version- may not seem much storage for certain disciplines but it will certainly serve the needs of many other ones and LabArchives may also be installed locally for those centres with larger storage requirements.

While some institutional approaches to RDM infrastructure creation include the development of in-house built RDM platforms, many others couldn't possibly afford the cost of such a task. In this sense, LabArchives means the opportunity to democratize the management of research data. The main requirement for LabArchives to succeed as a fully functional alternative RDM tool is now to ensure its interoperability with other well-known data management platforms such as Dryad or the institutional data repository network. Once it achieves that, it may become a formidable competitor to the JANET-brokered UMF cloud-based infrastructure for data management - and indeed a very useful complement to it.

Saturday 24 March 2012

Northwest England DCC roadshow at the University of Salford: a report



  Although not directly related to current Sonex work on analysing requirements for dataset transfer via Sword, the international workgroup was interested in attending a DCC roadshow for gathering a view on -and providing its own input to- RDM-related training initiatives that complement direct institutional experience in RDM acquired through JISC MRD projects. So when a chance showed up to attend the Northwest England DCC roadshow at the University of Salford, we were happy to engage with the UK Digital Curation Centre for being there at the University Library on March 20th and 21th.

Training initiatives regarding research data management were also thoroughly discussed at the 'Research Data Management: Activities and Challenges' workshop organised by the Knowledge Exchange Primary Data Workgroup in Bonn last November, which Sonex also attended (and provided a report for), so there were opportunities in Salford for identifying synergies between national and international RDM initiatives in this regard. This was also the right occasion for highlighting an example of best practice in RDM-related training activities based on local network building and promoting extensive debate on where to start and how to carry on with the work, while disseminating the appropriate tools to do it along the way.

The DCC roadshow proved to be a very effective complement indeed to JISC MRD programme and other RDM-related initiatives for reaching the 'common university' - i.e. those ones where preliminary efforts -be it at researcher survey level- are taking place to build some kind of RDM infrastructure but with no particular 'official' support outside -sometimes even inside- their institutions. The event in Salford gathered representatives from many NW universities -Salford, Manchester, Liverpool, Sheffield, Leeds- and debates along the roadshow were very much enriched by the mixture of institutional profiles attending it, from librarians to research office managers to researchers to ethics committee members. The experienced DCC roadshow team -Martin Donnelly, Andrew McHugh and Patrick McCann- were also very efficient in promoting dialogue and passing on guidelines and expertise along the event.

A shorter schedule was applied to this NW England roadshow, that took just two days instead of three: a first day devoted to presentations on data management initiatives taking place in the region and a second day for group discussions on research data management needs and how to use tools provided by DCC to identify them (such as DAF, Cardio or DMPOnline). The DCC Data management roadshow is also an evolving creature and there are slight variations in content among different editions thereof, this meaning that the event focus can be adapted to different levels of regional RDM implementation: emphasis can be for instance made on advanced RDM tools such as DMPOnline where RDM initiatives are well under way while mainly focusing on the Data Assessment Framework initiative in regions where RDM lies yet at a preliminary implementation stage.

Highlights from the first day included an estimulating 'Towards Open Worlds' keynote speech by Professor Martin Hall, a Vice Chancellor showing an unusually high commitment to Open Access and keen to debate related issues with the audience. An inspiring presentation of the two-stage MaDAM/MiSS JISC MRD project at U of Manchester was also delivered by Meik Poschen, providing some guiding light for preliminary initiatives in RDM currently being carried out at other institutions. Finally, Day I sessions were closed with a four expert panel discussion, in which presenters at the event were asked to stress a specific issue in RDM they considered worth deeper examination. The answers were: Cost model (Meik Poschen, UoM), Limits to researcher time availability (Rachel Kane, U Sheffield), Who shoud lead RDM tasks - is the Library able to? (Julie Berry, Salford U) and Research motivation as a decisive argument (Graham Pryor, DCC).

Along subsequent discussions Sonex became aware of three relevant points:

- Benefits can arise regarding these issues from a deeper analysis of international RDM initiatives -including ongoing and forthcoming European projects- connected to the institutional activity in the area,

- Besides disseminating specific funder mandates, finding a way for estimating the institutional costs derived from universities not managing their research data could be a potentially very effective argument for engaging universities with RDM activity.

- There is much emphasis in discussions on how to train researchers, but not so much on how to set up and train a team of dedicated data librarians - this strongly depending on Library staff figures and on whether or not librarians see themselves as fit for the task.


On the roadshow Day II the EPSRC policy framework on RDM and its implications for RDM strategy implemention at universities and research centres were discussed, and several joint RDM planning activities were carried out by different groups using DCC tools for examining aspects such as benefits to be obtained from RDM, where each institution stands in terms of RDM implementation strategy or how to deeper engage research groups and university management into RDM.

From a Sonex point of view, attending the roadshow proved very useful for identifying successful models of RDM training and dissemination, and we would humbly recommend to provide this RDM training initiative an international profile once complete so that similar efforts may be applied to a broader context. It would also be useful that participants in the DCC roadshows could provide feedback on the impact of their taking part in the initiative on their institution's work on RDM implementation a few months afterwards. Any future reporting from the DCC roadshow team on their initiative will be a very interesting read indeed and we shall be following dissemination initiatives outside the UK -such as the talk on DMPOnline at the Future Perfect 2012 Conference in Wellington this week- and hoping they'll soon arrive to continental Europe, where their work on Data Management Plans may be particularly valuable in the near future.




Saturday 11 February 2012

SONEX work on repository interoperability to be presented at the 2nd Open Access Forum


  The communication "The SONEX Workgroup for the Analysis of Repository Interoperability Issues: a Summary of Activities" (in Spanish) presented by the JISC-funded SONEX Workgroup has been accepted for the 2nd Open Access Forum to be held Apr 16-17th along the INFO2012 conference in Havana, Cuba. The motto for this 2nd Open Access Forum is "Interoperability: the Basis for the Ecology of Open Access Repositories".


The selected list of topics for the 2nd Open Access Forum includes:

  • Standards for Open Access Repository (OAR) Interoperability

  • CRIS/OAR Interoperability

  • Value-Added Services based on Repository Interoperability (such as Repository Usage Aggregation Systems)

  • Linked Data and Enriched Digital Objects

  • Integration of Repositories and Electronic Publishing Platforms

  • Semantic Interoperability

  • Interoperability between Open Access Repositories and e-Learning Platforms

  • Distributed Repository Networks

Tuesday 7 February 2012

Report on the Knowledge Exchange Workshop on RDM released


  The report on the Workshop on Research Data Management held last November by Knowledge Exchange (KE) at the Wissenschaftszentrum Bonn has already been released. This report summarizes expert group discussions on RDM funding, training, infrastructure and organisation challenges held after the KE "A Surfboard for 'Riding the Wave'" report was presented at the Workshop.



Tuesday 13 December 2011

Thematic parallel session on metadata - actions to be taken


  On Day II of the JISC MRD Programme 2011-13 launch event in Nottingham, last Dec 2nd, specific subject-based discussion sessions were held among the different JISCMRD02 Projects for research data management in order to promote synergies and joint work on common issues. This is a brief report on the outcomes of such discussions at the parallel session on metadata - some other were simultaneously held for Institutional, Life Sciences, Engineering or Archaeology MRD projects, whose discussions have been reported elsewhere (and there are also other posts summarizing talks for this one too).

It was really hard for some of us to pick a single of those groups, since many projects actually belonged to several strands (some lucky ones had also two representatives at the event, it should be noted). The session on metadata was attended, among others, by:

- Anna Clements (U St Andrews)
- Simon Kerridge (U Sunderland)
- Kevin Ginty (U Sunderland)
- Charlotte Pascoe (British Atmospheric Data Centre)
- Pablo de Castro (SONEX Workgroup)
- Simon Hodson (JISC MRD Programme manager)
- David Shotton (U Oxford)
- Louise Corti (UK Data Archive)
- Marco Fabiani (Queen Mary U London)
...


Discussion

Metadata standards were repeatedly discussed along the session - there was a joint (and unsuccessful) attempt to recall whether anyone knew about a metadata standard registry available for different disciplines. Representatives from CERIF4Datasets Project, University of Sunderland, mentioned they were using the MEDIN metadata standard for their work in marine sciences data management. The Core Scientific Metadata Model (CSMD) standard, developed at STFC for the I2S2 Project was also mentioned as an interesting approach to multi-disciplinary metadata standard for structural sciences such as Chemistry, Materials Sciences, Earth Sciences or Biochemistry. Finally, the PIMMS Project (BADC/U Reading), mentioned Metafor as a Climate Science metadata standard and their goal of using PIMMS software tool to generate CIM-based content.

At some point the idea catched up that metadata standards should perhaps be mandated by publishers in order to harmonise discipline-specific data description procedures. Publishers are actually involved in several very successful international RDM projects, such as Dryad, but -save for REWARD- are significantly missing in JISCMRD02 projects.

Having previously developed the Semantic Publishing and Referencing (SPAR) Ontologies, David Shotton said he was now working on their extension to CERIF-based metadata description of datasets, which is closely linked to dataset CERIFication work being carried out at the CERIF4Datasets Project.


Actions

The following actions were proposed for improving the chances of metadata standard harmonisation - hence enhancing dataset discoverability:

  • Trying to locate (or otherwise collect) an already existing registry of metadata standards for different disciplines, in order to offer researchers from a given discipline an already tested metadata schema they can re-use,

  • Mapping metadata standards to each other aiming to produce a minimum-sufficient-information metadata set that may be widely applicable accross disciplines,

  • Taking steps towards organising a workshop in order to have metadata issues discussed among relevant stakeholders. ANDS Metadata Workshop in 2010 might be a potential source of inspiration for this with all those discipline-based approaches to metadata standards. Proposed dates for this Metadata WS were spring-summer 2012.


Finally, there was a wrap-up by different subject-based project groups which showed strong possibilites for a more stable cooperation among them (Biomedical/Healthcare projects even discussed the possibiity of building a common wiki). Some cooperation frameworks (googlegroups, mailing lists) might be set for promoting this disciplinar trans-project collaboration. Regarding the metadata strand, it should be noted it was also an issue in discussions held at most subject-specific workgroups, so it would potentially allow contributions from all of them.

Friday 2 December 2011

The dawn of a new JISC MRD programme - Day I



  After a successful first stage of the JISC Managing Research Data (MRD) Programme (2009-2011), a second phase of JISC MRD was launched yesterday at the NCSL Conference Centre in Nottingham, along a 2-day event that will continue today. JISC MRD02 Programme includes 27 projects classified in three different strands:

Strand A. Research Data Management Infrastructure: 17 projects, to be completed from Mar to Jul 2013, comprising Institutional Pilot projects, Institutional Embedding and Transition to Service projects, Disciplinary projects for creative arts and archaeology, and a Metadata project,

Strand B. RDM Planning: 8 projects running until Mar 2012, aiming to design and implement data management plans and supporting services for researchers,

Strand C. Enhancing DMPOnline projects: 2 projects, aiming to customize and enhance the DCC DMPOnline Tool to improve its interaction with institutional/ disciplinary information systems).

It is worth noting that a number of funded RDM projects along this 2nd programme stage are building upon previous pilot work (projects carried out along JISC MRD programme 2007-2011) in order to for instance extend and embed data management services accross the whole institution.

On describing the research data management programme, Simon Hodson, JISC MRD programme manager mentioned there will be two further JISC MRD calls as early as Jan 2012, dealing with:

- Research data publications, aiming to build partnerships among involved stakeholders and encouraging data citation and publication,

- RDM Train, aiming to design and implement data management training strategies for specific disciplines and support roles (including librarians), to be performed by linking to professional bodies.

Emphasis will also be made along this 2nd JISC MRD programme stage on evidence gahering for project benefits and impact. A session devoted to these issues will be held on Dec 2nd, with practical work with both the Benefits Framework Tool and the Value Chain Impacts Tool. Developing metrics for measuring project impact is a specific programme goal along this 2nd implementation stage.

Project blogging

Another JISCMRD02 main objective -and closely related to impact measurement- is promotion of project dissemination and interaction among themselves and with the broader community via blogging. A specific presentation on 'blogging practices to support project work' was delivered for the purpose by Brian Kelly, UKOLN. The presentation highlighted the relevance of publishing project blogposts as an alternative means of expression to writing research papers or code, and engaged the audience in finding shared views regarding potential benefits blogging may bring to RDM projects, also providing some useful technical advice along the way.

Subsequent discussion focused on pros and cons of blogging as a communication technique (both from regular bloggers' and researchers' viewpoint), as well as on potential advantages of JISCMRD project blog aggregation, with a common RSS feed embedded back into the JISC site.

Parallel sessions and poster-session networking

Two parallel sessions came afterwards, dealing with two principal RDM issues: a first one on DCC Tools, introducing Data Asset Framework (DAF), DMPOnline and CARDIO, and summarized by Paul Stainthorp, U Lincoln, on his JISCMRD02 Day I blogpost.

The 2nd parallel session dealt with UMF Tools and related RDM projects. This 2nd session featured presentations by John Milner on JANET Brokerage and Andy Powell on Eduserv Cloud Pilot, along which the strategy for Academic Cloud service implementation was described - based on the "work with the willing" driving line. The Dynamic Purchasing System (DPS) -originally developed for utilities such as water or light- will be re-used as purchasing framework for cloud-related services. Regarding Eduserv, a 2-month 'introductory tier' will be available (just for institutions) along the service gradual implementation (storage being currently single-site, with no backups at this pilot stage, though there are plans for offering tape backup for part of the stored infrastructure).

After an interesting Q&A time, in which backup was suggested to be an absolute requirement for the success of the initiative and there were questions on various Eduserv use mode details (such as the possibility of using departmental orders/purchase order instead of credit cards for academic use), five projects from the UMF strand were briefly presented which are already working either based on a SaaS approach or in the cloud, or both: these were BRISSkit (Jonathan Tedds, U Leicester), DataFlow (David Shotton, U Oxford), Smart Research Framework (or ELB software as a service, Tim Parkinson, U Southampton), VIDaaS and YouShare Projects. Slides for these presentations will shortly be available and will be linked from here.

Finally, Day I official programme ended with a poster session and networking event, which meant a really good opportunity for RDM projects to interact with each other and with 'fellow travellers'. Synergies among projects became quite evident when having all them displayed together on a set of panels, and having their representatives available and willing to discuss each project aims, challenges and similarities to others offered a very good chance to get the general picture along with the details, as well as for establishing inter-project liasons that went well over closure time.



Sunday 6 November 2011

euroCRIS Membership Meeting – Autumn 2011, Lille, France



  On Nov 2-3 the autumn 2011 euroCRIS membership meeting was held at the University of Lille 3 in Lille, France. Attendees from 14 countries (13 European nations plus Canada) met for two days at the Univ-Lille3 Maison de Recherche for learning about the new CERIF 1.3 version (to be released Dec 2011) and the growing number of CERIF-based CRIS implementations in Europe, with a special focus on French ones (see event programme).

Brigitte Joerg, euroCRIS CERIF Task Group Leader and German Research Center for Artificial Intelligence (DFKI), delivered a CERIF v1.3 tutorial at the beginning of the membership meeting. After a general-purpose introduction to CERIF, CRIS Systems and the euroCRIS Group for first-time meeting attendees, the tutorial went into describing new features in the new CERIF 1.3 release (CERIF versions will no longer be named by their year of release as they were so far). Such features include the so-called Infrastructure entities (Facility, Equipment, Service) that have been added to the already existing CERIF Entity Types, namely Base entities (Project, Person, Organisational Unit), Result entities (ResultPublication, ResultPatent, ResultProduct), Second Level entities and Link entities.


Furthermore, the JISC RIM2 MICE Project outcomes (Measuring Impact Under CERIF) have also been brought into the CECRIF 1.3 release under the Measurement & Indicator section. MICE was one of the RIM2 projects –together with CERIFy, BRUCE and IRIOS- presented last September at the JISC programme workshop in Manchester. MICE finished on July 2011 and aimed to “examine the potential for encoding systematic and structured information on research impact in the context of the CERIF schema. MICE aims to build on previous work on impact by producing a comprehensive set of indicators which will then be mapped both to the CERIF standard and the CERIF4REF schema created by the previous Readiness for REF (R4R) Project”. MICE-inspired CERIF 1.3 updates include creation of a new CERIF table, namely the impact measure table, as well as a set of impact indicators: categories that include such concepts as improving performance of existing businesses, improved health outcomes and cultural enrichment. euroCRIS was also involved in the RIM2 UKOLN-led CERIFy Project, dealing with measures of esteem, whose results were as well inspiring for CERIF new Measurement & Indicator definition.

Another new feature for this CERIF release is the Geographic bounding boxes, which will allow displayed information to be restricted to a given geographic area. Geographic bounding boxes are presently defined as squares, thus leaving room for geolocation improvement in future CERIF versions. Finally, a new Linked Open Data (LOD) CERIF Task Group is being planned by euroCRIS.

As a result from this new features, changes in CERIF 1.3 release include a whole set of new entities (such as cfMedium as a new Document Type) and new attributes, as well as removal of some other outdated attributes. The new CERIF version described at the tutorial was a preview, with features such as XML Data Exchange Format Specification and CERIF Formal Semantics still being worked upon until 1.3 version gets finally released next December.

An euroCRIS Overview Session followed the CERIF Tutorial, along which different members of euroCRIS Board reported recent activity. Keith Jeffery highlighted the euroCRIS Rome Declaration on CRIS/IR integration issued earlier this year and mentioned that while CERIF can generate multiple metadata standards such as DC, MODS, etc, OAR usual qDC-based metadata model was insufficiently accurate, so some integration should be seeked along the model CRIS-Publications OAR-Data/Software OAR.


Other euroCRIS-related activity includes EU FP7 OpenAIRE Project moving from qDC to some semi-CERIF standard, as well as the fact that OpenAIRE+ Project will use CERIF. By definition, CERIF serves a multiple-institution scheme (thus allowing for wider context-related information sharing for purposes such as the Research Excellence Framework assessment in the UK), so there’s also a clear need to operate internationally as to demonstrate CERIF interoperability capabilities.

Harry Lalieu, euroCRIS Secretary, announced CRIS2012 Conference to be held in Prague next June, and 2012 euroCRIS membership meetings, which will tale place in Prague just before the CRIS2012 event and possibly in Spain later next year.

Anne Asserson, Universitetet i Bergen and responsable for euroCRIS strategy, announced dataset management as the next environment CERIF will be next moving into (with projects such as University of Sunderland-led CERIF for Datasets paving the way for such move).

Speaking on behalf of Ed Simons, Universiteit Nijmegen and euroCRIS website manager, Keith Jeffery informed the audience a test CRIS is being planned for inclusion at the euroCRIS site, thus allowing for future live-demoing and functionality analysis.

Within the euroCRIS Task Group reports, Brigitte Joerg mentioned the euroCRIS Board-authored paper “Towards a Sharable Research Vocabulary (SRV) - A Model-driven Approach” having been presented at the Metadata and Semantics Research Conference (MTSR 2011) held last October in Izmir, Turkey. A preliminary meeting with Virtual Open Access Agriculture & Aquaculture Repository (VOA3R) Project was also recently held in Madrid in order to plan the future euroCRIS Linked Open Data (LOD) Task Group.

Nikos Houssos, NDC Athens and Task Group Projects leader mentioned running EC FP7 Projects euroCRIS is involved into, such as ENGAGE, dealing with Open Access to Public Sector Information, EuroRIs-Net, one of whose outputs is providing an online CERIF database of RI stakeholders, and OpenAIRE+. UK/JISC Projects such as CERIFy, CRISPool, BRUCE, IRIOS, MICE or RMAS were also cited as a proof of CERIF gradually becoming a common standard for RIM Programme Projects. Many of those projects are having an active euroCRIS involvement.

Danica Zendulková, CVTISR and CRIS-IR Interoperability Task Group leader, announced upcoming TG work along lines such as defining usecases for CRIS/IR interoperability, defining a model of integration interface (including XML data exchanges and web services), implementng an authority file model with attached persistent ID and promoting cooperation between CRIS/OAR communities.

Finally, David Baker, CASRAI and euroCRIS Architecture Task Group manager explained the way towards the Reference CRIS implementation. According to implementation plans, a test CRIS should be available at the euroCRIS site on June 2012.

Several sessions –see euroCRIS meeting presentations- followed the euroCRIS Overview, summarizing recent and forthcoming developments in CRIS and CERIF implementation. An interesting discussion was also held, led by Joachim Schöpfel, on teaching CRIS Systems to his Information Science students at Université de Lille and on potential CERIF application to the teaching environment and scholarly activities beyond research.

A particularly relevant presentation –as it described CERIF-based CRIS implementation in the UK, where CERIF standard adoption has been most successful so far– was UKOLN Rosemary Russell’s “CERIF UK landscape” (final report to be formally published later this year by UKOLN-University of Bath). Some figures were mentioned at the presentation: 17 PURE/Atira CERIF-based CRIS were implemented in the UK along last year, plus 5 Converis/Avedas CRISes and a large number of Symplectic Elements.


The CERIF UK Landscape Project carried out a set of seven interviews among ‘CRIS Project managers’ from different institutions - based at the institutional Research Office (2), Library/Info Services (4) or IT Department (1)- in order to gather their views on the implementation process, CRIS reception by end-users (researchers) and staff, plus experience on CERIF and integration with Institutional Repositories. A summary of the –often not so encouraging– answers is available at the presentation, CERIF being perceived by many as a far too complicated standard whose management would rather be handed over to the CRIS commercial provider. It is a fact however that institutions running a CERIF-based CRIS are in a much better position to deal with the REF requirements.