First objective of the JISC-supported Sonex initiative was to identify and analyse deposit opportunities (use cases) for ingest of research papers (and potentially other scholarly work) into repositories. Later on, the project scope widened to include identification and dissemination of various projects being developed at institutions in relation to the deposit usecases previously analyzed. Finally, Sonex was recently asked to extend its analysis of deposit opportunities to research data.






Showing posts with label Research Infrastructure. Show all posts
Showing posts with label Research Infrastructure. Show all posts

Sunday, 6 November 2011

euroCRIS Membership Meeting – Autumn 2011, Lille, France



  On Nov 2-3 the autumn 2011 euroCRIS membership meeting was held at the University of Lille 3 in Lille, France. Attendees from 14 countries (13 European nations plus Canada) met for two days at the Univ-Lille3 Maison de Recherche for learning about the new CERIF 1.3 version (to be released Dec 2011) and the growing number of CERIF-based CRIS implementations in Europe, with a special focus on French ones (see event programme).

Brigitte Joerg, euroCRIS CERIF Task Group Leader and German Research Center for Artificial Intelligence (DFKI), delivered a CERIF v1.3 tutorial at the beginning of the membership meeting. After a general-purpose introduction to CERIF, CRIS Systems and the euroCRIS Group for first-time meeting attendees, the tutorial went into describing new features in the new CERIF 1.3 release (CERIF versions will no longer be named by their year of release as they were so far). Such features include the so-called Infrastructure entities (Facility, Equipment, Service) that have been added to the already existing CERIF Entity Types, namely Base entities (Project, Person, Organisational Unit), Result entities (ResultPublication, ResultPatent, ResultProduct), Second Level entities and Link entities.


Furthermore, the JISC RIM2 MICE Project outcomes (Measuring Impact Under CERIF) have also been brought into the CECRIF 1.3 release under the Measurement & Indicator section. MICE was one of the RIM2 projects –together with CERIFy, BRUCE and IRIOS- presented last September at the JISC programme workshop in Manchester. MICE finished on July 2011 and aimed to “examine the potential for encoding systematic and structured information on research impact in the context of the CERIF schema. MICE aims to build on previous work on impact by producing a comprehensive set of indicators which will then be mapped both to the CERIF standard and the CERIF4REF schema created by the previous Readiness for REF (R4R) Project”. MICE-inspired CERIF 1.3 updates include creation of a new CERIF table, namely the impact measure table, as well as a set of impact indicators: categories that include such concepts as improving performance of existing businesses, improved health outcomes and cultural enrichment. euroCRIS was also involved in the RIM2 UKOLN-led CERIFy Project, dealing with measures of esteem, whose results were as well inspiring for CERIF new Measurement & Indicator definition.

Another new feature for this CERIF release is the Geographic bounding boxes, which will allow displayed information to be restricted to a given geographic area. Geographic bounding boxes are presently defined as squares, thus leaving room for geolocation improvement in future CERIF versions. Finally, a new Linked Open Data (LOD) CERIF Task Group is being planned by euroCRIS.

As a result from this new features, changes in CERIF 1.3 release include a whole set of new entities (such as cfMedium as a new Document Type) and new attributes, as well as removal of some other outdated attributes. The new CERIF version described at the tutorial was a preview, with features such as XML Data Exchange Format Specification and CERIF Formal Semantics still being worked upon until 1.3 version gets finally released next December.

An euroCRIS Overview Session followed the CERIF Tutorial, along which different members of euroCRIS Board reported recent activity. Keith Jeffery highlighted the euroCRIS Rome Declaration on CRIS/IR integration issued earlier this year and mentioned that while CERIF can generate multiple metadata standards such as DC, MODS, etc, OAR usual qDC-based metadata model was insufficiently accurate, so some integration should be seeked along the model CRIS-Publications OAR-Data/Software OAR.


Other euroCRIS-related activity includes EU FP7 OpenAIRE Project moving from qDC to some semi-CERIF standard, as well as the fact that OpenAIRE+ Project will use CERIF. By definition, CERIF serves a multiple-institution scheme (thus allowing for wider context-related information sharing for purposes such as the Research Excellence Framework assessment in the UK), so there’s also a clear need to operate internationally as to demonstrate CERIF interoperability capabilities.

Harry Lalieu, euroCRIS Secretary, announced CRIS2012 Conference to be held in Prague next June, and 2012 euroCRIS membership meetings, which will tale place in Prague just before the CRIS2012 event and possibly in Spain later next year.

Anne Asserson, Universitetet i Bergen and responsable for euroCRIS strategy, announced dataset management as the next environment CERIF will be next moving into (with projects such as University of Sunderland-led CERIF for Datasets paving the way for such move).

Speaking on behalf of Ed Simons, Universiteit Nijmegen and euroCRIS website manager, Keith Jeffery informed the audience a test CRIS is being planned for inclusion at the euroCRIS site, thus allowing for future live-demoing and functionality analysis.

Within the euroCRIS Task Group reports, Brigitte Joerg mentioned the euroCRIS Board-authored paper “Towards a Sharable Research Vocabulary (SRV) - A Model-driven Approach” having been presented at the Metadata and Semantics Research Conference (MTSR 2011) held last October in Izmir, Turkey. A preliminary meeting with Virtual Open Access Agriculture & Aquaculture Repository (VOA3R) Project was also recently held in Madrid in order to plan the future euroCRIS Linked Open Data (LOD) Task Group.

Nikos Houssos, NDC Athens and Task Group Projects leader mentioned running EC FP7 Projects euroCRIS is involved into, such as ENGAGE, dealing with Open Access to Public Sector Information, EuroRIs-Net, one of whose outputs is providing an online CERIF database of RI stakeholders, and OpenAIRE+. UK/JISC Projects such as CERIFy, CRISPool, BRUCE, IRIOS, MICE or RMAS were also cited as a proof of CERIF gradually becoming a common standard for RIM Programme Projects. Many of those projects are having an active euroCRIS involvement.

Danica Zendulková, CVTISR and CRIS-IR Interoperability Task Group leader, announced upcoming TG work along lines such as defining usecases for CRIS/IR interoperability, defining a model of integration interface (including XML data exchanges and web services), implementng an authority file model with attached persistent ID and promoting cooperation between CRIS/OAR communities.

Finally, David Baker, CASRAI and euroCRIS Architecture Task Group manager explained the way towards the Reference CRIS implementation. According to implementation plans, a test CRIS should be available at the euroCRIS site on June 2012.

Several sessions –see euroCRIS meeting presentations- followed the euroCRIS Overview, summarizing recent and forthcoming developments in CRIS and CERIF implementation. An interesting discussion was also held, led by Joachim Schöpfel, on teaching CRIS Systems to his Information Science students at Université de Lille and on potential CERIF application to the teaching environment and scholarly activities beyond research.

A particularly relevant presentation –as it described CERIF-based CRIS implementation in the UK, where CERIF standard adoption has been most successful so far– was UKOLN Rosemary Russell’s “CERIF UK landscape” (final report to be formally published later this year by UKOLN-University of Bath). Some figures were mentioned at the presentation: 17 PURE/Atira CERIF-based CRIS were implemented in the UK along last year, plus 5 Converis/Avedas CRISes and a large number of Symplectic Elements.


The CERIF UK Landscape Project carried out a set of seven interviews among ‘CRIS Project managers’ from different institutions - based at the institutional Research Office (2), Library/Info Services (4) or IT Department (1)- in order to gather their views on the implementation process, CRIS reception by end-users (researchers) and staff, plus experience on CERIF and integration with Institutional Repositories. A summary of the –often not so encouraging– answers is available at the presentation, CERIF being perceived by many as a far too complicated standard whose management would rather be handed over to the CRIS commercial provider. It is a fact however that institutions running a CERIF-based CRIS are in a much better position to deal with the REF requirements.

Wednesday, 19 October 2011

MaDAM: A JISC MRD Project for Research Data Management in the Biosciences... on the move


  Being in Manchester for the JISC Research Information management (RIM2) event, Sonex didn’t miss the opportunity it provided for paying a visit to the University of Manchester John Rylands University Library and meeting the JISC MRD MaDAM Project team. The 'MaDAM Pilot data management infrastructure for biomedical researchers at University of Manchester' has been funded by the JISC Managing Research Data Programme from Oct 2009 to Jun 2011 and has provided an inspiring example on how to start building an institutional research data management infrastructure almost from scratch.

In order to start developing this RDM infrastructure (see the Project Final Report for details), MaDAM focused on a set of research groups from the biomedical sciences strand aiming to learn about the ways they dealt with data management and to provide them -with their own close involvement- with tools to improve and standardise such practices. Selected research groups -Electron and Standard Microscopy group and Magnetic Resonance Imaging (MRI) Neuropsychiatry Unit- were chosen due to their common need to deal with large images as their main source of research data.

Project focus on a rather narrow research scope was one of the keys to its success - due to its resulting ability to define common ways for dealing with the information, eg at metadata level. The MaDAM planning included further RDM strategy extension to other research groups within the UoM based on the lessons learnt from its application to the few initially selected groups. The MiSS Project (MaDAM into Sustainable Service), funded by the JISC MRD Programme 2011-2013, will be dealing with the RDM strategy extension and widening into the whole of the UoM research works along next years.


An Oracle APEX-based research data management application was developed by MaDAM for the concerned UoM research groups -later to be revamped in order to adapt it to the regular software standards applied at UoM. Frequent meetings were held with researchers along the aplication development so their feedback could be collected to ensure it would meet their needs. Storage needs per researcher per year were estimated (at around 500 GB), a metadata standard for specific data description was devised and stored in the RDM application, and work was carried out with interoperability isses in mind, both with the University CRIS in order to automatically populate Grant and Project information attached to datasets, and with the UoM Fedora-based eScholar IR, where final-version datasets would be transferred via Sword for dissemination, sharing and re-use.


Along the MaDAM Project several conceptual needs regarding the implementation of a solid RDM infrastructure across the UoM (and beyond) were identified -which were later included in the Project Final Report- the main two of which are the following:

- Some means of academic recognition of data-related work by researchers should be put in place in order to promote their involvement in RDM schemas and the adoption of common practices,

- A research data management policy should be adopted by the University of Manchester similar to the one issued at U Edinburgh so that some guidelines are established for providing support to researcher RDM tasks.

MaDAM gradual roll-out to other UoM research groups will face a set of challenges, research data being so discipline-specific. However, plans for such an extension and for ensuring the required institutional support for such a move were designed along MaDAM development -which saw the interest in taking part in the pilot project by a number of additional UoM research groups- and extension work will start soon.

Friday, 14 October 2011

CERIFying Research Information Systems... and Research Data


  A couple of weeks ago Sonex was attending the JISC Research Information management (RIM2) event at MCC Manchester. It was a very good opportunity to review the four JISC-funded projects (BRUCE at Brunel, IRIOS at Sunderland, CERIFy at UKOLN and MICE at KCL) dealing with CERIF implementation for research information management purposes. A report for the event should be shortly available, along with the slides presented at the event.

Along this one-day meeting the CERIF for Datasets (C4D) Project was mentioned as an IRIOS Project extension to dataset management at the University of Sunderland. As stated in the project presentation, C4D aims to 'CERIFy' existing research dataset metadata conventions, and hence provide access to research data in an environment which also holds information on research projects and research outputs. C4D will also explore the commonality of research dataset metadata, and how much can be represented in CERIF.

Saturday, 17 September 2011

Progress on Researcher ID initiatives: IRISC 2011 Helsinki


  
The problem with names...

Prof. Carlos Martínez-Alonso is a renowned Spanish senior biochemist. He was actually President of the Spanish National Research Council (CSIC) when the Berlin Declaration was signed by the institution in January 2006. Prof. Martínez-Alonso has published hundreds of papers in high impact factor journals. However, when retrieving a complete list of his publications from PubMed database, you find out it is not possible unless several parallel author queries are carried out: there is a Martinez-A C entry under which most of his publications get listed [222]. But then there's also Martinez-Alonso C [21] and even Alonso CM [1].

It might be argued it's all about funny Spanish names with two surnames in them. That's a problem alright. Not just for Spanish names though: it's quite the same for Portuguese/Brazilian authors as well. Not to mention transliteration of Asian author names (see "Which Wei Wang?" Phys Rev 2007 editorial). PubMed is presently running its Author ID project in order to tackle this problem, which is by no means exclusive of theirs: around 2/3 of the over 6 million authors in MEDLINE share a last name and first initial with at least one other author, and an ambiguous name refers to 8 persons on average (Torvik and Smalheiser, "Author name disambiguation in MEDLINE").

Name disambiguation and proper attribution is a well-known problem in the scholarly publishing ecosystem. There have been and there are lots of initiatives trying to tackle this complex issue at subject, institutional or even national level - with remarkable success in the case of the Dutch Digital Author Identifier (DAI).

However, this is not an issue to be tackled at national nor subject level, but globally. Commercial stakeholders such as ThomsonReuters or Elsevier-Scopus are then in a privileged position to implement some international author unique identification schema. From a knowledge discovery viewpoint there are however some problems in this commercial-stakeholder approach: the ResearcherID, ThomsonReuter's author identifier, will provide seamless integration with ISI Web of Knowledge and show all author publications registered in that database, but will otherwise leave out most of the research output.

Some joint effort between public institutions and private stakeholders (remarkably publishers) must therefore be attempted to unify the multiple author identification standards and devise a single, comprehensive one at a global level. And that's where ORCID comes in.

  
... and strategies to tackle it: IRISC 2011 workshop

The Open Researcher & Contributor ID (ORCID) initiative started in Dec 2009 as a non-profit organisation. Currently over 240 participants have joined the project for developing the one research identifier which is not limited to discipline, institution or geographical area. Many other projects are working in this issue at the same time (such as abovementioned discipline-based PubMed Author ID and Cornell University initially institutional then grown to national VIVO initiative).

ORCID and VIVO were two of the main topics of the IRISC 2011 Workshop on Identity in Research Infrastructure and Scientific Communication held this week (Sep 12-13) in Helsinki - see the event programme with attached presentations. Gudmundur "Mummi" Thorisson, Research Associate at University of Leicester and member of ORCID Technical Working Group, was IRISC 2011 main organizer.

There were two major IRISC 2011 strands: identity regarding knowledge discovery and identity for security & access control (focusing mainly on identity federation). A third big cross-issue along the Helsinki event was research data management, from three different perspectives:

i) dealing with a rapidly increasing amount of biomedical research data (Andrew Lyall, EMBL, ELIXIR Project)

ii) dealing with clinical research sensitive data (see Tony Brookes GEN2PHEN Project presentation)

iii) benefits the ORCID implementation might bring to research data attribution and management (mentioned in most ORCID-related presentations and discussions along the workshop)


There were several presentations dealing both with ORCID and closely resembling VIVO initiatives. Martin Fenner, Hannover Medical School and member of ORCID Board of Directors announced the ORCID registration service will start operating in spring 2012. ORCID will be open: researchers will be able to manage & maintain their profiles, filed data will be openly available, ORCID-related software will be released as open source, and researchers will control their privacy settings (with a chance too to share with particular members). Finally, for ORCID identity definition purposes, self-claim as well as external claiming sources will be used.

Brian Lowe, University of Cornell, presented the already running NIH-funded, institutionally-managed VIVO initiative. VIVO is aiming for an extensible semantic model-based more comprehensive approach than ORCID. However, links have already been established between both initiatives and ORCID is hoping to build upon VIVO success in the US.


Breakout sessions were held on IRISC Day 2 on the workshop's two main strands: "Unique identifiers and the Digital Scholar" (lead by Cameron Neylon and Jason Priem) and "What do researchers need from the authentication and authorisation infrastructure (AAI)?" (chaired by Michael Linden, CSC). Breakout session #1 was devoted to discussing potential tools and services to researchers ORCID could provide in the short term (6 months from adoption). Several groups were set up for the purpose and proposed ideas were later voted and discussed for selecting three main future worklines for ORCID to deal with. The proposed and selected use cases were the following:

-> data submission to repositories (multiple task attribution)

service to enable attribution or comment

pre-populate ORCID data

-> manuscript/grant tracking system

ORCID app gallery

-> automatic CV maintenance (potentially including data citations in CVs)

connecting different author research & social network profiles

Selected ORCID use cases were later introduced by Cameron Naylon along his talk 'ORCID and researchers' at the second annual ORCID Outreach Meeting held at CERN on Sep 16th, 2011.

Monday, 29 August 2011

Research data management in crystallography at the XXII IUCr Congress


  On Aug 29th a session on research data management will be held at the XXII Congress of the International Union of Crystallography (IUCr2011). The session will feature talks by Brian McMahon (IuCr), Brian Matthews (I2S2 Project), Peter Murray-Rust (CrystalEye), John Westbrook (wwPDB) and Nick Spadaccini (DDLm). Peter Murray-Rust will deliver a talk along the session on Open Crystallography.

Saturday, 20 August 2011

Repositories and CRIS: Working Smartly Together


  Due to recent involvement in other OA repository-related activities at the University of Khartoum, reports at this blog on recent events such as the 'Repositories and CRIS: Working Smartly Together' workshop organised by RSP last Jul 19th in Nottingham and the 4th edition of the Repository Fringe in Edinburgh were slightly delayed. Good news about it is that interesting reports on these events have been published in the meantime (see the RSP event review by Gareth J. Johnson at UKCoRR blog). This will allow Sonex to take a different approach to the reporting, making it more of a reflection than of a description, as well as covering the conference followup.

One of the subjects discussed along the Reposit project session within the Conference at EMCC was what mailing list or discussion group should replace the reposit@googlegroups.com forum for discussing IR and CRIS-related issues once the RePosit project comes to an end. Several options were considered, from using already existing lists such as UKCoRR's or ARMA's, to creating a new Super-CRIS list at JISC mail such as cris-super@jiscmail.ac.uk. Steps are being taken after the workshop to make this new list available.

The REF is working as a very strong driver towards CRIS implementation (with CERIF format being extensively considered in order to become a standard, see Marc Cox's presentation). A good number of HEIs do now operate a CRIS as a result (either commercial, in-house built or an extension of their EPrints repository). That is the good news. The not so good ones may be the fact that due to CRIS systems offering an enhanced collection of features, RIM infrastructure managers are starting to wonder whether an Open Access repository (usually managed by the Library) isn't becoming a somehow redundant piece of software, with most of its functionalities being increasingly covered by the CRIS (managed at the Research Offices). Repository phase-out is thus beginning to be discussed at given institutions for integration and optimization purposes. However, as Janet Aucock (University of St. Andrews) writes in the reposit@googlegroups list, even if the degree of overlap between repositories and CRIS systems may be large and growing, there are still features a CRIS will not be able to deliver:

"(...) Another point is to do your homework really well and make absolutely sure that the CRIs can deliver everything that a repository can do. Can it provide established permanent identifiers for items? Can it handle embargoes effectively? What about stats? Does the discovery interface in the portal display all the metadata that you need with regard to open access full text eg rights statements etc. These are small details which we take for granted but are not always embedded into the CRIS. CRIS software is still evolving too, and perhaps not all the functionality necessary is there yet. Another aspect of this is the question of the interfaces for users and discovery. Is the CRIS successfully harvested or crawled by search engines. Is it ranked appropriately. Can it expose metadata appropriately to other services where required? Can it isolate metadata with full text attached/open access full text attached and allow that set to be harvested and reused? We know that our own CRIS supplier is still working on adding all the "repository" functionality that they think is needed for their product. But at the moment I don't know the fine detail of this".

Besides R4R/CERIF4REF Project at KCL mentioned by Marc Cox, other projects also dealing with CERIF implementation regarding CRISes were mentioned such as MICE for Measuring Impact under CERIF, or the BRUCE Project (Brunel Research Under a CERIF Environment) that was presented at the 2011 euroCRIS meeting in Bologna last May (see Sonex post on the two recent euroCRIS meetings in Italy).

Another interesting outcome of this RSP event was the opportunity to learn from local SHERPA RoMEO team about the RoMEO API new v2.8 version and the release of the SHERPA RoMEO Publisher's Policy Tool, that will allow publishers to directly define their RoMEO policies via an embedded portal in SHERPA (actually presented next day, Jul 20th, at the 'RoMEO for Publishers' event in London).

Finally, a poster was featured in the event poster section called “SICA: A CRIS with an embedded Repository working for the innovation in Andalusia Region (Spain)”. With this integrated system for recording scientific production of the researchers belonging to nine universities, research organizations, technology centres and other scientific institutions of the Andalusia region in Spain, the National & Regional CRIS/IR integration initiatives (as recorded by Sonex in its May'2010 post) keep growing. This particular CRIS initiative is being developed within the European SISOB Project on -yet again- how to measure the impact of science in society.

Besides this -not thorough nor systematically updated- Sonex list of National & Regional CRIS/IR integration initiatives, a comprehensive list of 'CRIS + Repositories in the UK' is being put together as a Conference followup. When complete (it's open for any missing one to be filled in) the list will join the RSP Wiki where Institutional Repositories in the UK are already listed as to provide a clear picture of existing infrastructure.

Sunday, 17 July 2011

KULTURising research repositories


  "...I can only add that research for art, craft and design needs a great deal of further research. Once we get used to the idea that we don't need to be scared of 'research' - or in some way protected from it - the debate can really begin."
(Christopher Frayling, RCA Rector (1996-2010), from: "Research in Art and Design" (Royal College of Art Research Papers, Vol 1, No 1, 1993/4). Royal College of Art, London).


  On the Jul 6th meeting at JISC Brettenham House some planning was done as well for Sonex extension besides Swordv2's. In the framework of this project extension, Sonex is expected inter alia to further support the JISC Deposit Projects and continue to gather international deposit use cases, as well as to provide some
recommendations on how to improve deposit.

As part of this further involvement with JISC Deposit Projects, Sonex was attending the Kultivate Project Conference on Jul 15th at the Royal Institute of British Architects (RIBA).


Based at the Visual Arts Data Service (VADS), a research centre at the University for the Creative Arts, and funded by the JISC from late November 2010 to the end of July 2011 within the JISC Deposit strand, the Kultivate Project aims to "share and support the application of best practice in the development of institutional repositories that are appropriate to the specific needs and behaviours of creative and visual arts researchers". Kultivate builds upon the knowledge and experience of the Kultur II group, which grew out of the JISC funded Kultur project (2007-2009). The Group currently consists of over forty institutions and projects and is led by the VADS.

Specific goals of the Kultivate project are:

- to increase the rate of arts research deposit,
- to enhance the user experience for researchers, and
- to develop and sustain a sector-wide community of shared best practice in arts research repositories.

There are significant differences between Kultivate and the rest of the JISCdepo projects (RePosit, DURA and DepositMO) in the sense that while the three other ones deal specifically with semi-automation of widely-recognised content ingest into repositories (mainly by fostering platform interoperability), Kultivate seeks
to extend the coverage of institutional repositories to the creative arts environment, which is both rather different in nature to the mentioned well-accepted research and which hasn't been specifically addressed so far as scholarly output. In this regard, Kultivate can be both seen as sort of an outlier project and as the most challenging of them four.


After eight months of hard work, the Kultivate Project Conference put together a model set of talks and presentations (see programme and updated presentations) to introduce the project outcomes.

Several talks made introductory reflections on what creative arts research should be - with its specific peculiarities. The fact that the output from activities in the creative arts is or is not called research (artists themselves sound a bit surprised sometimes on being called researchers) doesn't seem that relevant anyway - main thing actually being it's scholarly output from many HEIs and Arts Schools, and as such it should be subject to standard deposit into institutional repositories.

However, it is often hard to persuade artists to have their work filed into repositories ("the repo doesn't fit the needs of creative artists" a frequent allegation for not taking part in the project). In this regard, advocacy is particularly critical for institutional projects being carried out in the area - they are breaking through in a discipline where no such thing could possibly exist (so far) as PubMed, Chemical Abstracts or arXiv.

See examples of effective advocacy under the Kultivate project umbrella at Goldsmiths Research Online and UAL Research Online, plus the own Kultivate Advocacy Toolkit, one of the project's main outputs.

Another relevant progress Kultivate is promoting is the setting of metadata standards for description of creative artworks (something that incidentally brings the project closer to the data management strand rather that to the deposit one, making it a quite heterodox one). See for instance 'The listening room' item at UAL Research Online with its four-tabbed description including metadata as well as images and videos (and thus effectively delivering an answer to frequent artists complain on work documentation: "I did a performance, not a video" or "Fine, but where am I?").


Performance Art Data Structure (PADS), for which the unit subject to description is the 'work' not the 'digital object', is yet another solution for complex description of creative arts output developed by the University of Bristol within the JISC-funded CAiRO Project for Complex Archive Ingest for Repository Objects (see example of PADS example record for 'Becoming snail' performance by Paul Hurley at JISC Digital Media).
PADS is also involved in the Europeana attempt to standardise perfomance metadata accross the EU.

Finally, a good (and growing) number of EPrints-based implementations of the Kultur enhancements for designing creative arts output-focussed institutional repositories were presented at the project conference (incidentally arising questions by DSpace-based IR managers on when something similar will be developed for DuraSpace). Kultivate has also provided (in cooperation with the University of Southampton team) a set of technical enhancements to the EPrints platform, among them on the MePrints application and the IRStats package.


Implementation of those enhancements by different institutions (either arts-focussed or general purposed ones with Arts Departments within them) is giving way to a wave of repository KULTURisation (ie being adapted to deal with creative arts output) across the UK that might well spread beyond that once working standards are consolidated. In the meantime the VADS-lead eNova project is already building upon the outputs of both Kultivate and Kultur projects.

Monday, 11 July 2011

Sword-Sonex project extension


  "Data deposit nowadays... is mainly based upon submission by email... and remains labour-intensive"
(Simon Hodson, JISCMRD Programme manager, on present data deposit workflows)


Representatives of the JISC-funded Sword and Sonex projects met Balviar Notay and Simon Hodson (JISC) on July 6th at Brettenham House, London for further dealing with Sword v2 extension to automated transfer of research data (see reference to last meeting on the issue on Nov 20th).

Once the first round of JISCMRD Phase I projects is over and final reports have been published, the Sword-Sonex workteam is already working to put together a data transfer use case document where different project solutions are listed, with their advantages and shortcomings, so that some analysis is carried out on how Sword might aid the automation of the dataset transfer into repositories (or similar target resources for research data). The team will liaise with several JISCMRD projects in order to find out their specific approach to the data transfer issue. Timeschedule for the extended Sword project (coordinated by Paul Walk, UKOLN) is as follows:

WP1: Identify key projects & individuals who have relevant information and skills regarding datasets [Jul 6-13]

WP2: Document the dataset use cases in collaboration with Sonex [Jul 18-end Aug]

WP3: Interpret the data set use cases as processes carried out with Sword [Sep 5-24]

WP4: Carry out gap analysis on dataset use cases on Sword and recommend future work, and produce a web resource for any new or existing JISC projects (such as those in JISCMRD2 Programme) to refer to, which will provide all the relevant information regarding dataset deposit [Sep 27-Oct 21]

WP5: Identify key Sword clients and potential client environments, accept and evaluate proposals, issue development contracts [Jul 6-Aug 15]

WP6: Development of 1, 2 or 3 client environments [Sep 5-end Oct]

WP7: Project management and administration [Jul 6-end Oct]

Sunday, 13 March 2011

Strategies for research data deposit in ongoing data management projects


  Prior to start performing pattern analysis for research data deposit into (institutional or subject-based) data repositories –whether or not open access– first step by Sonex is to scope ongoing projects dealing with that kind of deposit, as well as already closed projects which supplied relevant guidelines on the subject. A list of projects working on data management follows, with their specific approach on how to deal with actual data deposit as taken from project blogs:


TARDIS (Monash University–Australian National Data Service).
“There is a pressing need for the archival and curation of raw X-ray diffraction data. However, the relatively large size of these datasets has presented challenges for storage in a single worldwide repository. This problem can be avoided by using a federated approach, where each institution or university utilizes its institutional repository”.


ADMIRAL: A JISC-funded data management infrastructure for research across the life sciences.
"The purpose of the ADMIRAL Project is to create a two-tier federated data management infrastructure for use by life science researchers, that will provide services (a) to meet their local data management needs for the collection, digital organization, metadata annotation and controlled sharing of biological datasets; and (b) to provide an easy and secure route for archiving annotated datasets to an institutional repository, The Oxford University Data Store, for long-term preservation and access, complete with assigned Digital Object Identifiers and Creative Commons open access licences".
(See Oxford University Library Services' Databank)


XYZ Project. “The XYZ Project will create a demonstrator of a new workflow for publishing data in support of full-text. The author prepares data for publication (if possible with validation) in a third-party trusted repository before the paper is submitted to a publisher. Our software will manage the deposition, release to reviewers, dis-embargo and for conventional publication or as a data journal. Two Open Access publishers (International Union of Crystallography and BioMed Central) are engaged with the project and will test the new workflow”.
Anticipated Outputs and Outcomes: A demonstrator repository hosted by the IUCr.


FISHnet: Freshwater information sharing network. “This project will allow researchers in multiple academic, governmental and voluntary-sector institutions to share their data. Data will be held securely in a sustainable subject repository which preserves and disseminates multiple datasets as part of the FreshwaterLife.org information portal. Data creators will be able to manage access rights to their content, from Open Access to sharing with trusted colleagues”.


DMBI: Data Management in Bio-Imaging. “The quantity of data generated by modern high-throughput bio-imaging systems presents a significant challenge in both data management and processing. Furthermore, there is no explicit system/way to record the processing algorithms and parameters that are used to produce results. Thus there is no strong link between images, software and results. This projects aims to address these issues”.
Anticipated Outputs and Outcomes: Build a prototype DMBI system around OMERO.


CaiRO: Curating Artistic Research Output. “No prominent subject-based repository exists to act as the custodians of arts practice-as-research data. Where institution provision for data management is in place (for instance, an institutional repository service) the arts researcher-practitioner cannot always rely on an understanding of the special nature of arts research data. More commonly, data is retained in departmental collections, built and maintained by small teams which often include researchers themselves”.


BRIL: Biophysical Repositories in the Lab. “The BRIL project aims to enhance the repository facilities at the Randall Division of Cell and Molecular Biophysics at King’s College London. This will involve:
» Embedding the repository within the researchers’ day-to-day research and experimental practices;
» Integrating the repository into the wider King’s infrastructure”.
Example of KCL “internal” repository: Mutation Testing Repository.


ADS+: Enhancing and Sustaining the Archaeology Data Service digital repository. The project aims to “Increase the sustainability of the ADS, by implementing Fedora (Flexible Extensible Digital Object Repository Architecture). This is a world-leading open source digital repository application which will allow the automation of many ADS curatorial functions, according to the Open Archival Information System (OAIS) Reference Model (ISO 14721:2003). This will help ensure the long term preservation of all ADS digital archives, as well as making the ADS archival procedures more cost-effective”.


IDMB (Institutional Data Management Blueprint) Project, U. Southampton.
The project’s aims are to provide the University of Southampton with a ten-year roadmap for delivery of a comprehensive data management infrastructure.

[IDMB Recommendations] The data management audit and gap analysis indicates where improvements can be made in the short, medium and long-term to improve data management practices and capabilities at the University. The following preliminary recommendations are put forward for short (one year), medium (one to three years), long (more than three years) term action.
[Short Term (1 year)] Crucial to supporting researchers is the consolidation of data management into a coherent framework that is easy to understand, use, and has a sustainable business model behind it. A number of major recommendations are put forward here for the short-term:
Create an institutional data repository
• Develop a scalable business model
• One-stop shop for data management advice and guidance


MaDAM: Pilot data management infrastructure for biomedical researchers at University of Manchester.
A pilot infrastructure for Biomedical Researchers at the University of Manchester, which covers data capture, data storage and data curation. This infrastructure comprises procedural support, hardware and software.
[18/03/2010] The development team have built a prototype data management front end which fits a generic set of needs amongst our Life Sciences researchers. It is aimed at being flexible enough to allow researchers themselves to assign attributes (i.e. metadata) to their experiments and datasets for them to be usefully categorised and tagged. The prototype is also entirely dispensable and intended as a catalyst for feedback from our use cases on their specific functionality requirements.


DISC-UK DataShare Project. The DISC-UK DataShare project, led by EDINA National Data Centre and the Edinburgh University Data Library, with partners at the Universities of Southampton and Oxford, has advanced the current provision of repository services for accommodating datasets in the UK.
Key conclusions: 1) Data management motivation is a better bottom-up driver for researchers than data sharing but is not sufficient to create culture change, 2) Data librarians, data managers and data scientists can help bridge communication between repository managers & researchers, 3) Institutional repositories can improve impact of sharing data over the internet.

Thursday, 3 March 2011

Repository take-up and embedding: the future of repositories


  Being already in Birmingham for the JISC Deposit Project Meeting on Mar 1st, Sonex stayed in town for attending the JISC Repositories Take-Up and Embedding Meeting as well. Start up meeting for this new JISC programme aimed to outline the future of repositories, dealing with specific issues such as (automated) deposit, shared services like RoMEO or OpenDOAR, repository integration into general software infrastructures for research information managament and promoting national (via RSP) and international (via KE, COAR and OpenAIRE) collaboration.

Six projects were presented along this programme start up meeting:

- Bringing a Buzz to NECTAR (Miggie Pickton, University of Northampton)
- Hydrangea: letting the repository flower (Richard Green, University of Hull)
- MIRAGE 2011: Repository Enrichment from Archiving to Creation (Xiaohong Gao, Middlesex University)
- Enhanced interface design for supporting take-up and embedding of the Glasgow School of Art research repository, including visual
engagement with practice led and applied outputs (Robin Burgess, Glasgow School of Art)
- eNova (Marie-Therese Gramstadt, VADS)
- EXPLORER: Embedding eXisting & Propriatary Learning in an Open-source Repository to Evolve new Resources (Alan Cope, De Montfort University)

An extra postprandial presentation on repository consolidation within a university research information management environment and the way it was done at University of Glasgow Enlighten IR was delivered by Willian Nixon. Statements like "Silos are the past, embedding repositories -through the use of tools like Sword or LDAP- is the future" made the point on how repositories should evolve in the future. According to William, repositories are to exploit new opportunities for data mining, business, intelligence, KPIs, analytics, 'stickiness' and visibility (some of these issues being thoroughly dealt with at Enlighten repository blog).

There was a remarkable presence of image-related projects among the presentations, Glasgow School of Arts, eNova and MIRAGE 2011 dealing with archiving of images into repositories one way or another. This is great news for momentum-gaining development of new information infrastructures in the area (also traceable at the JISC Deposit Programme meeting the day before), which will no doubt benefit from these projects outcomes.

After watching project presentations from a Sonex point of view, it seems they could particularly benefit from interacting with JISC Deposit projects in terms of implementing resulting strategies for automated content ingest into repositories. A handful of the take-up and embedding projects would thus be the soundest candidates for initial "customer implementation" of the various resulting methods for quick population of repositories with institutional research output (the take-up bit, prior to embedding) coming from the Deposit strand. As these projects will run
until the end of 2011 and the ones from Deposit strand should deliver around July, interaction among them could probably be easily achieved.

There was one particular project among those presented that captured Sonex's attention: MIRAGE 2011, Middlesex Medical Image Repository with a Content-Based Image Retrieval Systems Archiving Environment. MIRAGE is both an image-related repository project (as it deals with medical images) and a research data project, and it's this latter feature what gets it fully within scope of Sonex activity with regard to research data management. Ongoing data management projects (either JISC-funded or otherwise) usually deal with either numerical or textual data, but projects dealing with the deposit of graphical research data are rare (save for Data Management in Bio-Imaging - DMBI project run at The John Innes Centre, BBSRC, Norwich).

A couple of references were shared with MIRAGE project manager Dr. Xiaohong Gao, 'Feeding Neuroimaging Repositories' poster presented at OR2010 Madrid last July by a team of Universitat Autònoma de Barcelona (UAB)-Hospital de la Santa Creu i Sant Pau researchers in Barcelona, and the MIDAS/National Alliance for Medical Image Computing (NAMIC) medical image repository as to promote synergies among different projects on the same area.

The meeting presentations will shortly be available.

Tuesday, 3 August 2010

IRs as institutional assets for future Research Assessment Exercises

Beyond their relevance for open access dissemination of research output, the new role of Institutional Repositories as a key institutional research infrastructure for present or future Research Assessment Exercises was extensively debated last month at the Open Repositories Conference in Madrid (some good posts on OR10 available at CAIRSS). For this purpose, IRs should be embedded into the general institutional information research system, which brings up a series of integration/interoperability issues that lie at the heart of the Sonex work.

See below an analysis of several CRIS-IR integration possibilities for creating an institutional research information infrastructure that will live up to the challenge posed by future research assessment exercises.



Considerations on the role of CERIF standard were intentionally left out of the picture, as some debate is still taking place on whether or not it should be the base standard for CRIS-IR integration. Most implementations available have until now
chosen CERIF-based integration strategies to tackle the issue, but from ad-hoc light-CERIF versions to non-CERIF solutions whatsoever, there's still a high level of diversity in the way institutions are facing this challenge. At the same time,
CERIF4REF is being steadily worked out at KCL, and CERIF architecture is also being gradually brought into ePrints new versions.

A variety of research information system implementation usecases for RAE/REF purposes was also shown at Peter Burnhill's (Sonex/EDINA) "Repository Update UK" presentation at JISC/CNI meeting last month: from IRs being used as REF-gateways to the challenge it poses in terms of open access availability of contents, a whole set of issues arise as IRs undergo enhancement for fulfilling their new role.