First objective of the JISC-supported Sonex initiative was to identify and analyse deposit opportunities (use cases) for ingest of research papers (and potentially other scholarly work) into repositories. Later on, the project scope widened to include identification and dissemination of various projects being developed at institutions in relation to the deposit usecases previously analyzed. Finally, Sonex was recently asked to extend its analysis of deposit opportunities to research data.






Showing posts with label JISC. Show all posts
Showing posts with label JISC. Show all posts

Saturday, 11 February 2012

SONEX work on repository interoperability to be presented at the 2nd Open Access Forum


  The communication "The SONEX Workgroup for the Analysis of Repository Interoperability Issues: a Summary of Activities" (in Spanish) presented by the JISC-funded SONEX Workgroup has been accepted for the 2nd Open Access Forum to be held Apr 16-17th along the INFO2012 conference in Havana, Cuba. The motto for this 2nd Open Access Forum is "Interoperability: the Basis for the Ecology of Open Access Repositories".


The selected list of topics for the 2nd Open Access Forum includes:

  • Standards for Open Access Repository (OAR) Interoperability

  • CRIS/OAR Interoperability

  • Value-Added Services based on Repository Interoperability (such as Repository Usage Aggregation Systems)

  • Linked Data and Enriched Digital Objects

  • Integration of Repositories and Electronic Publishing Platforms

  • Semantic Interoperability

  • Interoperability between Open Access Repositories and e-Learning Platforms

  • Distributed Repository Networks

Tuesday, 13 December 2011

Thematic parallel session on metadata - actions to be taken


  On Day II of the JISC MRD Programme 2011-13 launch event in Nottingham, last Dec 2nd, specific subject-based discussion sessions were held among the different JISCMRD02 Projects for research data management in order to promote synergies and joint work on common issues. This is a brief report on the outcomes of such discussions at the parallel session on metadata - some other were simultaneously held for Institutional, Life Sciences, Engineering or Archaeology MRD projects, whose discussions have been reported elsewhere (and there are also other posts summarizing talks for this one too).

It was really hard for some of us to pick a single of those groups, since many projects actually belonged to several strands (some lucky ones had also two representatives at the event, it should be noted). The session on metadata was attended, among others, by:

- Anna Clements (U St Andrews)
- Simon Kerridge (U Sunderland)
- Kevin Ginty (U Sunderland)
- Charlotte Pascoe (British Atmospheric Data Centre)
- Pablo de Castro (SONEX Workgroup)
- Simon Hodson (JISC MRD Programme manager)
- David Shotton (U Oxford)
- Louise Corti (UK Data Archive)
- Marco Fabiani (Queen Mary U London)
...


Discussion

Metadata standards were repeatedly discussed along the session - there was a joint (and unsuccessful) attempt to recall whether anyone knew about a metadata standard registry available for different disciplines. Representatives from CERIF4Datasets Project, University of Sunderland, mentioned they were using the MEDIN metadata standard for their work in marine sciences data management. The Core Scientific Metadata Model (CSMD) standard, developed at STFC for the I2S2 Project was also mentioned as an interesting approach to multi-disciplinary metadata standard for structural sciences such as Chemistry, Materials Sciences, Earth Sciences or Biochemistry. Finally, the PIMMS Project (BADC/U Reading), mentioned Metafor as a Climate Science metadata standard and their goal of using PIMMS software tool to generate CIM-based content.

At some point the idea catched up that metadata standards should perhaps be mandated by publishers in order to harmonise discipline-specific data description procedures. Publishers are actually involved in several very successful international RDM projects, such as Dryad, but -save for REWARD- are significantly missing in JISCMRD02 projects.

Having previously developed the Semantic Publishing and Referencing (SPAR) Ontologies, David Shotton said he was now working on their extension to CERIF-based metadata description of datasets, which is closely linked to dataset CERIFication work being carried out at the CERIF4Datasets Project.


Actions

The following actions were proposed for improving the chances of metadata standard harmonisation - hence enhancing dataset discoverability:

  • Trying to locate (or otherwise collect) an already existing registry of metadata standards for different disciplines, in order to offer researchers from a given discipline an already tested metadata schema they can re-use,

  • Mapping metadata standards to each other aiming to produce a minimum-sufficient-information metadata set that may be widely applicable accross disciplines,

  • Taking steps towards organising a workshop in order to have metadata issues discussed among relevant stakeholders. ANDS Metadata Workshop in 2010 might be a potential source of inspiration for this with all those discipline-based approaches to metadata standards. Proposed dates for this Metadata WS were spring-summer 2012.


Finally, there was a wrap-up by different subject-based project groups which showed strong possibilites for a more stable cooperation among them (Biomedical/Healthcare projects even discussed the possibiity of building a common wiki). Some cooperation frameworks (googlegroups, mailing lists) might be set for promoting this disciplinar trans-project collaboration. Regarding the metadata strand, it should be noted it was also an issue in discussions held at most subject-specific workgroups, so it would potentially allow contributions from all of them.

Friday, 2 December 2011

The dawn of a new JISC MRD programme - Day I



  After a successful first stage of the JISC Managing Research Data (MRD) Programme (2009-2011), a second phase of JISC MRD was launched yesterday at the NCSL Conference Centre in Nottingham, along a 2-day event that will continue today. JISC MRD02 Programme includes 27 projects classified in three different strands:

Strand A. Research Data Management Infrastructure: 17 projects, to be completed from Mar to Jul 2013, comprising Institutional Pilot projects, Institutional Embedding and Transition to Service projects, Disciplinary projects for creative arts and archaeology, and a Metadata project,

Strand B. RDM Planning: 8 projects running until Mar 2012, aiming to design and implement data management plans and supporting services for researchers,

Strand C. Enhancing DMPOnline projects: 2 projects, aiming to customize and enhance the DCC DMPOnline Tool to improve its interaction with institutional/ disciplinary information systems).

It is worth noting that a number of funded RDM projects along this 2nd programme stage are building upon previous pilot work (projects carried out along JISC MRD programme 2007-2011) in order to for instance extend and embed data management services accross the whole institution.

On describing the research data management programme, Simon Hodson, JISC MRD programme manager mentioned there will be two further JISC MRD calls as early as Jan 2012, dealing with:

- Research data publications, aiming to build partnerships among involved stakeholders and encouraging data citation and publication,

- RDM Train, aiming to design and implement data management training strategies for specific disciplines and support roles (including librarians), to be performed by linking to professional bodies.

Emphasis will also be made along this 2nd JISC MRD programme stage on evidence gahering for project benefits and impact. A session devoted to these issues will be held on Dec 2nd, with practical work with both the Benefits Framework Tool and the Value Chain Impacts Tool. Developing metrics for measuring project impact is a specific programme goal along this 2nd implementation stage.

Project blogging

Another JISCMRD02 main objective -and closely related to impact measurement- is promotion of project dissemination and interaction among themselves and with the broader community via blogging. A specific presentation on 'blogging practices to support project work' was delivered for the purpose by Brian Kelly, UKOLN. The presentation highlighted the relevance of publishing project blogposts as an alternative means of expression to writing research papers or code, and engaged the audience in finding shared views regarding potential benefits blogging may bring to RDM projects, also providing some useful technical advice along the way.

Subsequent discussion focused on pros and cons of blogging as a communication technique (both from regular bloggers' and researchers' viewpoint), as well as on potential advantages of JISCMRD project blog aggregation, with a common RSS feed embedded back into the JISC site.

Parallel sessions and poster-session networking

Two parallel sessions came afterwards, dealing with two principal RDM issues: a first one on DCC Tools, introducing Data Asset Framework (DAF), DMPOnline and CARDIO, and summarized by Paul Stainthorp, U Lincoln, on his JISCMRD02 Day I blogpost.

The 2nd parallel session dealt with UMF Tools and related RDM projects. This 2nd session featured presentations by John Milner on JANET Brokerage and Andy Powell on Eduserv Cloud Pilot, along which the strategy for Academic Cloud service implementation was described - based on the "work with the willing" driving line. The Dynamic Purchasing System (DPS) -originally developed for utilities such as water or light- will be re-used as purchasing framework for cloud-related services. Regarding Eduserv, a 2-month 'introductory tier' will be available (just for institutions) along the service gradual implementation (storage being currently single-site, with no backups at this pilot stage, though there are plans for offering tape backup for part of the stored infrastructure).

After an interesting Q&A time, in which backup was suggested to be an absolute requirement for the success of the initiative and there were questions on various Eduserv use mode details (such as the possibility of using departmental orders/purchase order instead of credit cards for academic use), five projects from the UMF strand were briefly presented which are already working either based on a SaaS approach or in the cloud, or both: these were BRISSkit (Jonathan Tedds, U Leicester), DataFlow (David Shotton, U Oxford), Smart Research Framework (or ELB software as a service, Tim Parkinson, U Southampton), VIDaaS and YouShare Projects. Slides for these presentations will shortly be available and will be linked from here.

Finally, Day I official programme ended with a poster session and networking event, which meant a really good opportunity for RDM projects to interact with each other and with 'fellow travellers'. Synergies among projects became quite evident when having all them displayed together on a set of panels, and having their representatives available and willing to discuss each project aims, challenges and similarities to others offered a very good chance to get the general picture along with the details, as well as for establishing inter-project liasons that went well over closure time.



Wednesday, 19 October 2011

MaDAM: A JISC MRD Project for Research Data Management in the Biosciences... on the move


  Being in Manchester for the JISC Research Information management (RIM2) event, Sonex didn’t miss the opportunity it provided for paying a visit to the University of Manchester John Rylands University Library and meeting the JISC MRD MaDAM Project team. The 'MaDAM Pilot data management infrastructure for biomedical researchers at University of Manchester' has been funded by the JISC Managing Research Data Programme from Oct 2009 to Jun 2011 and has provided an inspiring example on how to start building an institutional research data management infrastructure almost from scratch.

In order to start developing this RDM infrastructure (see the Project Final Report for details), MaDAM focused on a set of research groups from the biomedical sciences strand aiming to learn about the ways they dealt with data management and to provide them -with their own close involvement- with tools to improve and standardise such practices. Selected research groups -Electron and Standard Microscopy group and Magnetic Resonance Imaging (MRI) Neuropsychiatry Unit- were chosen due to their common need to deal with large images as their main source of research data.

Project focus on a rather narrow research scope was one of the keys to its success - due to its resulting ability to define common ways for dealing with the information, eg at metadata level. The MaDAM planning included further RDM strategy extension to other research groups within the UoM based on the lessons learnt from its application to the few initially selected groups. The MiSS Project (MaDAM into Sustainable Service), funded by the JISC MRD Programme 2011-2013, will be dealing with the RDM strategy extension and widening into the whole of the UoM research works along next years.


An Oracle APEX-based research data management application was developed by MaDAM for the concerned UoM research groups -later to be revamped in order to adapt it to the regular software standards applied at UoM. Frequent meetings were held with researchers along the aplication development so their feedback could be collected to ensure it would meet their needs. Storage needs per researcher per year were estimated (at around 500 GB), a metadata standard for specific data description was devised and stored in the RDM application, and work was carried out with interoperability isses in mind, both with the University CRIS in order to automatically populate Grant and Project information attached to datasets, and with the UoM Fedora-based eScholar IR, where final-version datasets would be transferred via Sword for dissemination, sharing and re-use.


Along the MaDAM Project several conceptual needs regarding the implementation of a solid RDM infrastructure across the UoM (and beyond) were identified -which were later included in the Project Final Report- the main two of which are the following:

- Some means of academic recognition of data-related work by researchers should be put in place in order to promote their involvement in RDM schemas and the adoption of common practices,

- A research data management policy should be adopted by the University of Manchester similar to the one issued at U Edinburgh so that some guidelines are established for providing support to researcher RDM tasks.

MaDAM gradual roll-out to other UoM research groups will face a set of challenges, research data being so discipline-specific. However, plans for such an extension and for ensuring the required institutional support for such a move were designed along MaDAM development -which saw the interest in taking part in the pilot project by a number of additional UoM research groups- and extension work will start soon.

Friday, 14 October 2011

CERIFying Research Information Systems... and Research Data


  A couple of weeks ago Sonex was attending the JISC Research Information management (RIM2) event at MCC Manchester. It was a very good opportunity to review the four JISC-funded projects (BRUCE at Brunel, IRIOS at Sunderland, CERIFy at UKOLN and MICE at KCL) dealing with CERIF implementation for research information management purposes. A report for the event should be shortly available, along with the slides presented at the event.

Along this one-day meeting the CERIF for Datasets (C4D) Project was mentioned as an IRIOS Project extension to dataset management at the University of Sunderland. As stated in the project presentation, C4D aims to 'CERIFy' existing research dataset metadata conventions, and hence provide access to research data in an environment which also holds information on research projects and research outputs. C4D will also explore the commonality of research dataset metadata, and how much can be represented in CERIF.

Sunday, 17 July 2011

KULTURising research repositories


  "...I can only add that research for art, craft and design needs a great deal of further research. Once we get used to the idea that we don't need to be scared of 'research' - or in some way protected from it - the debate can really begin."
(Christopher Frayling, RCA Rector (1996-2010), from: "Research in Art and Design" (Royal College of Art Research Papers, Vol 1, No 1, 1993/4). Royal College of Art, London).


  On the Jul 6th meeting at JISC Brettenham House some planning was done as well for Sonex extension besides Swordv2's. In the framework of this project extension, Sonex is expected inter alia to further support the JISC Deposit Projects and continue to gather international deposit use cases, as well as to provide some
recommendations on how to improve deposit.

As part of this further involvement with JISC Deposit Projects, Sonex was attending the Kultivate Project Conference on Jul 15th at the Royal Institute of British Architects (RIBA).


Based at the Visual Arts Data Service (VADS), a research centre at the University for the Creative Arts, and funded by the JISC from late November 2010 to the end of July 2011 within the JISC Deposit strand, the Kultivate Project aims to "share and support the application of best practice in the development of institutional repositories that are appropriate to the specific needs and behaviours of creative and visual arts researchers". Kultivate builds upon the knowledge and experience of the Kultur II group, which grew out of the JISC funded Kultur project (2007-2009). The Group currently consists of over forty institutions and projects and is led by the VADS.

Specific goals of the Kultivate project are:

- to increase the rate of arts research deposit,
- to enhance the user experience for researchers, and
- to develop and sustain a sector-wide community of shared best practice in arts research repositories.

There are significant differences between Kultivate and the rest of the JISCdepo projects (RePosit, DURA and DepositMO) in the sense that while the three other ones deal specifically with semi-automation of widely-recognised content ingest into repositories (mainly by fostering platform interoperability), Kultivate seeks
to extend the coverage of institutional repositories to the creative arts environment, which is both rather different in nature to the mentioned well-accepted research and which hasn't been specifically addressed so far as scholarly output. In this regard, Kultivate can be both seen as sort of an outlier project and as the most challenging of them four.


After eight months of hard work, the Kultivate Project Conference put together a model set of talks and presentations (see programme and updated presentations) to introduce the project outcomes.

Several talks made introductory reflections on what creative arts research should be - with its specific peculiarities. The fact that the output from activities in the creative arts is or is not called research (artists themselves sound a bit surprised sometimes on being called researchers) doesn't seem that relevant anyway - main thing actually being it's scholarly output from many HEIs and Arts Schools, and as such it should be subject to standard deposit into institutional repositories.

However, it is often hard to persuade artists to have their work filed into repositories ("the repo doesn't fit the needs of creative artists" a frequent allegation for not taking part in the project). In this regard, advocacy is particularly critical for institutional projects being carried out in the area - they are breaking through in a discipline where no such thing could possibly exist (so far) as PubMed, Chemical Abstracts or arXiv.

See examples of effective advocacy under the Kultivate project umbrella at Goldsmiths Research Online and UAL Research Online, plus the own Kultivate Advocacy Toolkit, one of the project's main outputs.

Another relevant progress Kultivate is promoting is the setting of metadata standards for description of creative artworks (something that incidentally brings the project closer to the data management strand rather that to the deposit one, making it a quite heterodox one). See for instance 'The listening room' item at UAL Research Online with its four-tabbed description including metadata as well as images and videos (and thus effectively delivering an answer to frequent artists complain on work documentation: "I did a performance, not a video" or "Fine, but where am I?").


Performance Art Data Structure (PADS), for which the unit subject to description is the 'work' not the 'digital object', is yet another solution for complex description of creative arts output developed by the University of Bristol within the JISC-funded CAiRO Project for Complex Archive Ingest for Repository Objects (see example of PADS example record for 'Becoming snail' performance by Paul Hurley at JISC Digital Media).
PADS is also involved in the Europeana attempt to standardise perfomance metadata accross the EU.

Finally, a good (and growing) number of EPrints-based implementations of the Kultur enhancements for designing creative arts output-focussed institutional repositories were presented at the project conference (incidentally arising questions by DSpace-based IR managers on when something similar will be developed for DuraSpace). Kultivate has also provided (in cooperation with the University of Southampton team) a set of technical enhancements to the EPrints platform, among them on the MePrints application and the IRStats package.


Implementation of those enhancements by different institutions (either arts-focussed or general purposed ones with Arts Departments within them) is giving way to a wave of repository KULTURisation (ie being adapted to deal with creative arts output) across the UK that might well spread beyond that once working standards are consolidated. In the meantime the VADS-lead eNova project is already building upon the outputs of both Kultivate and Kultur projects.

Sunday, 15 May 2011

A first analysis of data management


As previously mentioned in this blog, the Sonex workgroup is now try to extend its use case scenario analysis on 'Deposit opportunities into repositories' to the realm of research data. A first meeting held at EDINA on Mar 30th served the purpose of drawing a general picture of the data management landscape.

Stress should be put on the fact that the way of handling SSH and STM data may substantially differ. Given the strong IASSIST-attachment of some Sonex members, the workgroup initial approach to data management may therefore be a bit biased towards procedures in the area of Social Science and Humanities. However, attention will be paid as well to specific ways of dealing with STM datasets as the analysis gets fine-tuned.

Moving along the same lines as we did for research articles, we first try to tackle the ACTIONS scope. Data deposit is certainly an issue, but there's more to data-related processes than just deposit. It's also about Access to data and also about Data Notification/Register.

Next we get on to the WHAT and the WHO. Answer to WHAT? is a data set. Previous analysis by Peter Burnhill shows -at least- three different types of research data (see image below).


Dealing mainly with the data file itself, this data type classification is somewhat narrow for the general picture of data management, so Sonex would rather set a new and more generic data classification for answering the question WHAT is there to deposit:

  • Metadata record

  • Codebook or user guide, where all necessary information is provided to allow for data re-use*

  • Raw data or dataset file(s)


* See a DCMI-based description at: Inter-university Consortium for Political and Social Research (ICPSR). (2009). Guide to Social Science Data Preparation and Archiving: Best Practice Throughout the Data Life Cycle (4th ed.). Ann Arbor, MI. Section 'Important documentation elements', p. 22

These three elements should ideally be supplied as a single package.

As to the question of WHO performs each data-related operation (Notification-Deposit-Grant access), a handful of running projects within the JISC MRD (phase I) programme should serve to test the different use cases resulting from a double-entry 'Action/What' table as featured below.


Next step as we proceed to further development of this preliminary analysis should be a survey for gathering information on procedures for data handling as carried out in specific JISC MRD projects.

Wednesday, 27 April 2011

National initiatives for promoting data management strategies: an overview


- "Hello, I want to deposit my data"
- "Sir, this is a library!"
- "Sorry" -he whispers- "I want to deposit my data".
(as told by Brian Hole, British Library, along his presentation of the DRYAD UK initiative)


  Main objective of the JISC MRD International Workshop held last month was to review progress achieved by the JISC Managing Research Data Programme and to discuss this in the context of broader international developments.

As stated in the workshop programme overview, "this dimension reflects key partnerships which JISC, the JISCMRD Programme and the DCC has been building through the IDCC Conference, the Knowledge Exchange and other initiatives. They include the Australian National Data Service, the NSF funded DataNet Projects, institutions in the US and Australia, the DFG, SURF, DANS etc".

Whithin the broader context, besides a couple of preliminary talks on the European Union approach to (and future funding of) data management initiatives -by John Wood, on the EU 'Riding the wave' report, and by Carlos Morais-Pires on the Digital Agenda for Europe- the workshop featured a specific session on "National and international infrastructure initiatives" whose first panel was called "Approaches and strategies in the UK, US, and Germany". Australian and Dutch national or specific approaches were also discussed, either at this session or later along the event.

Besides the national initiatives featured in this and further sessions along the meeting -it was reassuring to see such a broad scope of strategies or already running projects taking place at the same time in so many different countries- there are also additional, sometimes preliminary initiatives for promoting data management policies at national or institutional level in other countries such as Finland, Portugal, France, Poland or South Africa.

As new initiatives for research data management keep steadily coming up, this session was an opportunity to get an informal update on DCC's report 'Comparative Study of International Approaches to Enabling the Sharing of Research Data' - see its summary and main findings here as of Nov 2008.

Digital Curation Centre - UK
Kevin Ashley, Digital Curation Centre (DCC), described the present picture of data management in the UK as "a new context", where Universities are increasingly willing to take responsibility for data management (specially in areas not covered by Data Centres).
Once UK funder and NSF rules for Data Management Planning are being implemented, this in-advance planning is becoming very important for funders, researchers, institutions, collaborators and reusers. DCC current tasks include integrating different Data Discovery Services plus building institutional capacities: skills, policies, etc. Besides that, DCC is providing the new DMP Online service aimed to produce and maintain Data Management Plans.
Good news is that, despite varying degrees of involvement, institutions in the UK have accepted their role in RDM.

NSF-funded DataNet Projects - US
A summary of present state of research data management in the US was provided by presentations of the DataONE and DataConservancy initiatives, resp. delivered by William Michener (University Libraries at U New Mexico) and Sayeed Choudhury (Johns Hopkins University).

After stating that "researchers are presently using 90% of their time managing data instead of interpreting them", W. Michener presented the Data Observation Network for Earth (DataONE) initiative (a live DataONE presentation at U of Tennessee is available). This NSF-supported initiative aims to ensure preservation and access to multi-scale, multi-discipline, and multi-national science data. DataONE Coordinating Nodes around the world will help achieving needed international collaboration for solving the grand science and data challenges, particularly with regard to education.

The DataConservancy initiative aims to research, design, implement, deploy, and sustain data curation infrastructure for cross-disciplinary discovery with an emphasis on observational data. S. Choudhury's presentation stressed the need for data preservation as a necessary condition for data reuse and introduced the recent connection of data and publications through arXiv.org as one of the pilot projects that build upon the Project APIs.

DFG - Germany
New DFG information infrastructure projects in Germany were presented by Dr Stefan Winkler-Nees, who mentioned both Jan 2009 DFG Recommendations for Secure Storage and Availability of Digital Primary Research Data, as a base report for promoting standardized work in the data management area, and DFG running call for proposals "Information infrastructures for research data". Selected projects at this call are due to be shortly announced and will start on May/Jun'2011. Finally, in a a common line of thought with other initiatives, Dr. Winkler-Nees mentioned DFG is aiming for teaching and qualification of both researchers and data curators.


SURF Foundation & DANS - The Netherlands
Later on along the workshop, John Doove presented the SURF Enhanced Publications initiative within the SURFshare programme 2007-2011. Six new projects funded along 2011 by the SURF Foundation will allow researchers from a variety of disciplines to share datasets, illustrations, audio files, and musical scores with fellow researchers in the context of Enhanced Publications (programme video available on YouTube). There were already two previous grants rounds for Enhanced Publications. The six running projects, whose results are due in May 2011, take place within five disciplines: Economics (Open Data and Publications, Tilburg University), Linguistics (Lenguas de Bolivia, Radboud University Nijmegen, and Enhanced NIAS Publications, KNAW-Royal Netherlands Academy of Arts ans Sciences), Musicology (The Other Josquin, University Utrecht), Communication sciences (Enhancing Scholarly Publishing in the Humanities and Social Sciences, KNAW) and Geosciences (VPcross, KNAW).

The Dutch strategy for increasing research data available online was completed with the presentation "Sustainable and Trusted Data Management" delivered by Laurent Sesink (DANS-Data Archiving and Networked Services). DANS, est. 2005, deals with storage and continuous accessibility of research data in
the social sciences and humanities and promotes the 'Data Seal of Approval' for certification of data repositories, guaranteeing via a series of required criteria a qualitatively high and reliable way of managing research data.

Australian National Data Service (ANDS) - Australia
Finally, Andrew Treloar, Director of Technology, Australian National Data Service (ANDS), supplied a comprehensive perspective from a national infrastructure provider and in a way summarized previous talks by saying that, despite differences, there are common themes emerging in national approaches to data management, as there are things only they can do. Along his plenary presentation "Data: Its origins in the past, what the problems are in the present, and how national responses can help fix the future" he mentioned for instance that Hubble Space Telescope-related publication statistics show double research is being done thanks to data reuse. Efficiency, validation, integrity of scholarly records, value for money and self-interest were listed as (non-altruistic) arguments for data reuse.

Having the chance to attend this series of brilliant presentations and checking out how policies for opening access to research data keep spreading over institutions and countries were undoubtedly part of the Birmingham workshop highlights. Next opportunity for keeping up with it all will be next November at the Knowledge Exchange Workshop on Research Data Management in Bonn, Germany.

Monday, 25 April 2011

Could external cooperation improve collection of specific JISC MRD project-related information?


  In forthcoming days SONEX will be publishing some posts on the JISC MRD Programme International Workshop held last March 28-29th at Aston Business School Conference Centre, Birmingham. Certain aspects debated at this comprehensive meeting were very useful for establishing an approach for dealing with research data management from a SONEX viewpoint, as debated in a SONEX meeting at EDINA on Mar 30th whose outcome will also be shortly blogged.

See IUCr Brian McMahon's report for a general review on the JISC MRD workshop.

One of the most visible disciplinary approaches to data management presented at the JISC MRD event -which featured all kinds of institutional and subject-based initiatives in the area- was the one coming from meteorology, palaeoclimatology and climate-related sciences: there was a presentation of the PEG-BOARD Project (U of Bristol) at the Subject-Oriented Approaches session on Monday, followed by ACRID (U of East Anglia & STFC) and Metafor (BADC & STFC) Project presentations on Tuesday afternoon.

One of the most relevant features of these climate-related projects is interdisciplinarity. PEG-BOARD Project in particular aims to serve the archaeology research community by supplying them their paleoclimate data.


A few specific aspects about PEG-BOARD were discussed after the project presentation. Interesting thing about them is they were not mentioned along the talk, nor are they reported at the project site:

- Due to the project interdisciplinarity, there are two clearly different user groups for palaeoclimatology data produced: climatologists, who will understand the nature of involved datasets, as they're central to their discipline, and archaeologists, who don't and need not know much about the data format but need the information contained in it for their own purposes - thus functioning as regular non-technical users to the project instead of researchers. However, as they are indeed researchers, the feedback they may provide on the project outcome could be so much more valuable.

- What archaeologists care about in the end is the data plottings, and Data Centres will not provide such processing. So what PEG did was implement specific software capabilities that will address the needs of non-technical data users (i.e. archaeologists), as to allow them to search for the plots or false-colour graphics they need. This piece of middleware is a conceptual key feature of the project in terms of deliverables.

- Climate data is usually archived in binary format, so it's often not easy to process. UK Met Office provided lots of info, often incomplete or in old formats. The adaption process of raw data to the project needs was very interesting and worth disseminating.

- Climate models were written in FORTRAN. When re-written or translated into C++, the results would vary for the same data arrays due to specific treatment by the code. That poses a quite amazing challenge in terms of model interpretation.

- When asked on whether researchers provided enriched metadata for their data, the answer was there's usually an input in terms of past experiments, i.e. "this is the data outcome of such and such experiment when changing initial conditions in such a way". Such-and-such experiment would be described the same way until one was reached that wasn't described at all.

The fact that none of these project aspects is recorded or discussed at the project blog poses a question on whether an external approach to data management projects might collect and disseminate very interesting information that researchers may not consider relevant enough to discuss from project blogs. Such an external approach to running projects might be carried out by data librarians in order to
share these specific project details with the data management community.

For whatever it may be worth, Sonex would be keen to do this kind of job for the MRD community.

Tuesday, 5 April 2011

I2S2 Project workshop at RAL-STFC


  Along a busy week in terms of research data management events (due to be shortly reported from this blog), last Friday Apr 1st Sonex had the opportunity -thanks to Simon Hodson, JISC MRD programme manager- to attend the I2S2 Project workshop at the Rutherford-Appleton Laboratory (RAL) at STFC in Didcot. I2S2 -standing for 'Infrastructure for Integration in Structural Sciences' is a JISC MRD project ending in Mar 2011 aiming to "identify requirements for a data-driven research infrastructure in "Structural Science", focusing on the domain of Chemistry, but with a view towards inter-disciplinary application".


Several presentations were delivered along the meeting: Brian Matthews on the I2S2 project achievements, ICAT architecture and CSMD metadata standard, Brian McMahon, International Union of Crystallography (IUCr) on 'Information Management and Publication in Crystallography', Tom Griffin on TopCAT GUI for management of data coming out of STFC ISIS and DIAMOND facilities, Steve Androulakis on the TARDIS ANDS-supported project at Monash University, Mark Borkum on OreCHEM files, Chris Morris on on PiMS (Protein Information Management System) and Juan Bicarregui on the EU PANData project.

Along the IUCr presentation the need was identified for filing & preserving different data categories such as raw measurements, processed numerical data, derived info and the paremeters. The convenience of providing access to raw diffraction images was also stressed along the talk, these files being a few GB in size, and thus not large enough for Data Centres but too big for sites such as CCDC. A review on Crystallographic Information Framework (CIF) file formats was provided, with imgCIF being used for raw data storing out of the experiment, .fcf for including structure factors after data reduction and a final stage of structure solution and refinement being performed in the lab before the author starts formatting those into a IUCr paper, which would translate CIF into SGML for producing final fcf, cif, pdf and html versions.

Raw data was mentioned to be kept for 183 days at SFTC and 3 months at Australian Synchrotron (in which TARDIS is involved), and a discussion followed on the fact that some agreement shoud be reached on the kind of data that ought to be stored and preserved. The process of attachment of DOIs to datasets was also discussed, IUCr being presently involved in projects such as XYZ or Open Bibliography in order to promote this objective.


A TopCAT demo was provided by Tom Griffin. This open source GUI (see image above) is being used for storing raw data from STFC facilities such as ISIS and DIAMOND. TopCAT provides access to its contents through an open registration system, thus operating as a sort of STFC institutional data repository, and would be potentially applicable to other institutions, facilities and disciplines.

TARDIS presentation by Steve Androulakis, Monash Univ, Australia, mentioned their using of XML/METS metadata standards for research data description at the federated institutional repository-platform initially meant to store X-ray diffraction images, later evolving into a much larger initiative with application into microscopy (MicroTARDIS), particle physics and gene processing through the Squirrel software.

Finally, extra presentations were delivered on PiMS (Protein Information Management System) by Chris Morris, STFC and on the European PANData project by Juan Bicarregui, STFC e-Science. PANData aims to build Photon and Neutron Data Infrastructure through a consortium of European synchrotron facilities and neutron sources.


A final summary was made on the whole set of presented I2S2-related features (imgCIF, CIF, IuCr/XML/RDF BIBLIO, PDBML, CML, ICAT, TopCAT, ICAT Lite/CSMD, TARDIS, PiMS, PANData, NeXuS) by mapping them on the I2S2 Idealized Scientific Research Activity Lifecycle Model (see image above - may click on it for an updated version). References were also made to other initiatives not represented at the meeting such as Quixote Project for Computational Chemistry CML data management or Protein Production and Crystallization.

Thursday, 3 March 2011

Repository take-up and embedding: the future of repositories


  Being already in Birmingham for the JISC Deposit Project Meeting on Mar 1st, Sonex stayed in town for attending the JISC Repositories Take-Up and Embedding Meeting as well. Start up meeting for this new JISC programme aimed to outline the future of repositories, dealing with specific issues such as (automated) deposit, shared services like RoMEO or OpenDOAR, repository integration into general software infrastructures for research information managament and promoting national (via RSP) and international (via KE, COAR and OpenAIRE) collaboration.

Six projects were presented along this programme start up meeting:

- Bringing a Buzz to NECTAR (Miggie Pickton, University of Northampton)
- Hydrangea: letting the repository flower (Richard Green, University of Hull)
- MIRAGE 2011: Repository Enrichment from Archiving to Creation (Xiaohong Gao, Middlesex University)
- Enhanced interface design for supporting take-up and embedding of the Glasgow School of Art research repository, including visual
engagement with practice led and applied outputs (Robin Burgess, Glasgow School of Art)
- eNova (Marie-Therese Gramstadt, VADS)
- EXPLORER: Embedding eXisting & Propriatary Learning in an Open-source Repository to Evolve new Resources (Alan Cope, De Montfort University)

An extra postprandial presentation on repository consolidation within a university research information management environment and the way it was done at University of Glasgow Enlighten IR was delivered by Willian Nixon. Statements like "Silos are the past, embedding repositories -through the use of tools like Sword or LDAP- is the future" made the point on how repositories should evolve in the future. According to William, repositories are to exploit new opportunities for data mining, business, intelligence, KPIs, analytics, 'stickiness' and visibility (some of these issues being thoroughly dealt with at Enlighten repository blog).

There was a remarkable presence of image-related projects among the presentations, Glasgow School of Arts, eNova and MIRAGE 2011 dealing with archiving of images into repositories one way or another. This is great news for momentum-gaining development of new information infrastructures in the area (also traceable at the JISC Deposit Programme meeting the day before), which will no doubt benefit from these projects outcomes.

After watching project presentations from a Sonex point of view, it seems they could particularly benefit from interacting with JISC Deposit projects in terms of implementing resulting strategies for automated content ingest into repositories. A handful of the take-up and embedding projects would thus be the soundest candidates for initial "customer implementation" of the various resulting methods for quick population of repositories with institutional research output (the take-up bit, prior to embedding) coming from the Deposit strand. As these projects will run
until the end of 2011 and the ones from Deposit strand should deliver around July, interaction among them could probably be easily achieved.

There was one particular project among those presented that captured Sonex's attention: MIRAGE 2011, Middlesex Medical Image Repository with a Content-Based Image Retrieval Systems Archiving Environment. MIRAGE is both an image-related repository project (as it deals with medical images) and a research data project, and it's this latter feature what gets it fully within scope of Sonex activity with regard to research data management. Ongoing data management projects (either JISC-funded or otherwise) usually deal with either numerical or textual data, but projects dealing with the deposit of graphical research data are rare (save for Data Management in Bio-Imaging - DMBI project run at The John Innes Centre, BBSRC, Norwich).

A couple of references were shared with MIRAGE project manager Dr. Xiaohong Gao, 'Feeding Neuroimaging Repositories' poster presented at OR2010 Madrid last July by a team of Universitat Autònoma de Barcelona (UAB)-Hospital de la Santa Creu i Sant Pau researchers in Barcelona, and the MIDAS/National Alliance for Medical Image Computing (NAMIC) medical image repository as to promote synergies among different projects on the same area.

The meeting presentations will shortly be available.

Wednesday, 2 March 2011

JISC Repository Deposit Programme Meeting in Birmingham


  A JISC Repository Deposit Programme meeting was held on Mar 1st, 2011 at Maple House Birmingham. Under coordination from Balviar Notay, JISC manager for the Deposit projects, presentations were delivered from representatives of the four presently running projects under JISC Deposit call: DepositMO (Steve Hitchcock, U Southampton), DURA (John Norman, UCam), RePosit (Ian Tilsed, Leeds U) and Kultivate (Marie Therese Gramstadt, VADS). Additional presentations were done for the deposit-related Open Access Repository Repository Junction (OA-RJ) project (Theo Andrew, EDINA), Sword v2 (Richard Jones - Symplectic) and Sonex (Pablo de Castro, Carlos III University Madrid) projects.


Lots of interesting issues were raised and discussed along the set of presentations, and specific teamworking activities were later carried out for promoting cooperation between projects. This was the first opportunity for representatives of all projects involved in the JISC Deposit programme to personally meet the other projects and learn about their progress and potentially complementary findings.

Several complementary visions of deposit were outlined along the workshop: a quite technical one from projects such as DepositMO and Sword, an advocacy-focused approach from RePosit project aiming to increase engagement to repository and a vision of repositories as potential suppliers of the global institutional research output required for REF purposes from DURA.

Steve Hitchcock (DepositMO, implementing Sonex usecase scenario nr 4, Deposit via personal software) delivered a few demo examples of Swordv2-assisted deposit into the DepositMO test repository via local computer file manager, including deposit of previously parsed full-text document ingesting metadata as well and achieving the metadata+object transfer. A key question on document deposit for management vs publishing purposes was also raised along DepositMO presentation: are repositories (or could they evolve into) a proper environment for document management or does the Open Access philosophy prevent them from being used as cooperative tools for example for pre-print edition by a group of authors?

DURA and RePosit projects, implementing Sonex usecase nr 2, CRIS/IR integration, are both dealing with making deposit as easy as possible for the author community by ingesting previoulsy synced inputs from Mendeley and Symplectic Elements into IRs (DURA) and specificallly “increasing engagement with repository” (RePosit) by designing a set of awareness-raising materials and campaigns later to be shared with other projects.

Kultivate, aiming to increase deposit in the arts and design environment, is both the newest and possibly the most innovative project in the strand. Repository development having been strongly focused on research papers as a main research output, work on so far underexploited creative arts materials gives Kultivate the opportunity to set new standards and provide new resources to the Open Access repository community.


Further presentations for projects providing general-purpose deposit infrastructure followed, such as EDINA Open Access Repository Junction (OA-RJ) middleware for discovery and Sword-assisted deposit. OA-RJ is already live-testing its broker for automated transfer of publisher or subject repository content inputs into specific target repositories. Richard Jones described the ongoing process for developing Sword-v2, which will deliver fine-tuned functionalities for metadata+object automated transfer to the rest of the Deposit projects and the wider repository community, resulting in higher deposit rates. Finally, a Sonex presentation stressed the need for re-examining Sonex deposit usecase scenarios for covering new types of materials such as research data, creative arts materials and learning materials. Sonex also suggested common strategy for measuring success of JISC-funded deposit projects being designed at Birmingham City University Evidence Base might include specific questions to be asked to repository managers such as whether any given automated deposit strategy was used for content ingest purposes besides specific strategies for measuring success devised by projects themselves.

The workshop presentations will shortly be available at the Deposit wiki. Once Deposit projects are completed another programme meeting will be held for sharing conclusions and examine case studies and success stories as to widely implement resulting solutions.

Saturday, 18 December 2010

Sonex at the "Digital Library Research and Open Access: Interoperability Strategies" workshop

After delivering its paper "Handling repository-related interoperability issues" last Sep at the 2nd DL.org workshop in Glasgow, Sonex will be contributing a presentation at the forthcoming DL.org "Digital Library Research and Open Access: Interoperability Strategies" one-day event to be held at the British Academy in London next Feb 4th.









Sonex contribution will be part of this DL.org workshop dealing with digital libraries, Open Access repositories and interoperability among them. Already available conference programme includes presentations on DL. org reference model, DL.org policy and quality interoperability survey, degree of progress of Open Access repositories with regard to interoperability issues in the UK and Europe and research data library management among others.

Sunday, 12 December 2010

A preliminary list of discipline-specific projects on research data management

A preliminary list follows of currently running discipline-specific projects and initiatives (as of Dec 2010) dealing with research data management. The list below is not comprehensive, but a sample of ongoing projects, brought together in order to find out potential biases by area in current research data management projects. Should there be relevant projects missing, we’d appreciate a notification for including them as well.

[projects/initiatives listed in alphabetical order]

Project name: ACRID: Advanced Climate Research Infrastructure for Data
Institution/Funder/Manager: U East Anglia, STFC, Met Office, JISC
Project Description: The ACRID Project aims to develop an approach to publishing climate research data in a way that facilitates citing, re-use and the provision of full provenance information for processed data.
Area/Discipline: Climate Science


Project name: ADMIRAL
Institution/Funder/Manager: U Oxford, JISC
Project Description: A data management infrastructure for research across the life sciences
Area/Discipline: Life Sciences


Service/Project name: ADS: Archaeology Data Service
Institution/Funder/Manager: U York, AHRC, JISC, EU (mandated repository for AHRC, NERC)
Service/Project Description: The Archaeology Data Service supports research, learning and teaching with high quality and dependable digital resources. It does this by preserving digital data in the long term, and by promoting and disseminating a broad range of data in archaeology. The ADS promotes good practice in the use of digital data in archaeology, it provides technical advice to the research community, and supports the deployment of digital technologies.
ADS is actively engaged with research projects working with partners in all sectors of UK archaeology.
Area/Discipline: Archaeology


Project name: Global Argo Data Repository
Institution/Funder/Manager: NOAA, NODC (National Oceanographic Data Center), GODAE (Global Ocean Data Assimilation Experiment), IFREMER (Institute for Research and Exploitation of the Sea)
Project Description: In the year 2000, a global array of approximately 3,000 free-drifting profiling floats, known as the Argo Ocean Profiling Network, was planned as a major component of the ocean observing system. Argo originated from the need to make climate predictions on both short and long time scales and has led to international participation and collaboration to ensure global coverage.
Centers to handle the data collected by profiling floats have been established in a number of countries. These centers normally handle data from their nationally deployed floats, but sometimes provide that service to other countries or organizations. All Argo data will be publicly available in near real-time via the GTS (Global Telecommunications System) and in scientifically quality-controlled form with a few months delay.
Area/Discipline: Marine Sciences, Oceanography


Project name: BlueObelisk
Institution/Funder/Manager: Group of chemists/ programmers/informaticians
Project Description: The Blue Obelisk Data Repository lists many important chemoinformatics data such as element and isotope properties, atomic radii, etc. including references to original literature
Area/Discipline: Chemoinformatics


Project name: BRIL: Biophysical Repositories in the Lab
Institution/Funder/Manager: CeRch-KCL, JISC
Project Description: The BRIL project aims to enhance the repository facilities at the Randall Division of Cell and Molecular Biophysics at King’s College London by:
- Embedding the repository within the researchers’ day-to-day research and experimental practices
- Allowing data and metadata to be captured in automated fashion
- Allowing the structure of experimental processes as a whole to be captured, modelled and stored within the repository
- Enhancing browse and access facilities and data exchange facilities to increase interoperability.
Area/Discipline: Biophysics


Project name: CAiRO: Curating Artistic Research Output
Institution/Funder/Manager: U Bristol, DCC, JISC
Project Description: Research data created by the UK’s performance and visual arts departments is often rich, technically complex and amazingly varied in nature. This work may include interconnected multimedia records of a single live event or software which exhibits complex behaviours dependant upon the choices made by a viewer. The CAiRO project, funded as part of the wider JISC Managing Research Data programme, aims to offer data management skills tailored to the special requirements of the arts researcher-practitioner.
Area/Discipline: Creative Arts


Project name: The CEACS Data Library
Institution/Funder/Manager: CEACS Library, Center for Advanced Study in the Social Sciences (CEACS), Instituto Juan March, Madrid, Spain
Project Description: The CEACS Data Library provides support to its research community in conducting quantitative research with primary and secondary data. The Data Library has a collection of over 2,000 secondary research datasets from major data centres. The service supports research data management through a thematic website, one to one support and a Dataverse data repository to help with the management, sharing and preservation of the data produced by researchers.
Area/Discipline: Social Sciences


Project name: Data Conservancy: A New Vision for Data-Driven Science
Institution/Funder/Manager: National Science Foundation (NSF), Johns Hopkins University (Lead institution)
Project Description: The Data Conservancy (DC) embraces a shared vision: scientific data curation is a means to collect, organize, validate and preserve data so that scientists can find new ways to address the grand research challenges that face society.
Area/Discipline: Astronomy, Earth Sciences, Life Sciences and Social Sciences


Project name: DataONE
Institution/Funder/Manager: National Science Foundation (NSF)
Project Description: DataONE was conceived to ensure preservation and access to multi-scale, multi-discipline, and multi-national data about life on earth and the environment that sustains this life. It was recognized from the outset that such data are often difficult to discover, access, integrate and analyze.
Area/Discipline: Earth & Life Sciences


Project name: DataTrain
Institution/Funder/Manager: U Cambridge, ADS, DCC, JISC
Project Description: The DataTrain project aims to build on findings and tools developed in the Incremental project (JISC 07/09 funding strand), to design discipline-focused data-management training modules for post-graduate courses in Archaeology and Social Anthropology at the University of Cambridge.
Area/Discipline: Archaeology, Social Anthropology


Project name: DATUM for Health: Research data management training for health studies
Institution/Funder/Manager: Northumbria U, DCC, JISC
Project Description: This collaborative project seeks to promote research data management skills of postgraduate research students in the health studies discipline through a specially-developed training programme which focuses on qualitative, unstructured research data.
Area/Discipline: Health Sciences


Project name: DMBI: Data Management in Bio-Imaging
Institution/Funder/Manager: The John Innes Centre (BBSRC), Norwich BioScience Institutes, JISC
Project Description: DMBI aims to raise the level of data management/handling for high-throughput bio-imaging, and strengthen the interactions between image data silos, both internally and with partner organisations.
Area/Discipline: Biology/Bio-imaging


Project name: DMP-ESRC: Data management planning for ESRC research data-rich investments
Institution/Funder/Manager: UK Data Archive (UKDA), Economic and Social Research Council (ESRC), Joint Information Systems Committee (JISC)
Project Description: Data Management Planning (DMP) project aims to increase the data management and sharing capability within the social sciences community.
Area/Discipline: Social Sciences


Project name: DMTpsych: Data Management Training for psychologists
Institution/Funder/Manager: U York, U Sheffield, Sheffield Hallam U, DCC, JISC
Project Description: The aim of DMTpsych is to build capacity and skills within psychology postgraduates relating to research data management. The project builds upon existing research data management materials developed by the Digital Curation Centre (DCC) to create discipline-focused postgraduate training materials that can be embedded into postgraduate research training for the psychological sciences.
Area/Discipline: Psychology


Project name: DRYAD UK
Institution/Funder/Manager: British Library, University of Oxford, JISC
Project Description: Dryad is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences as published by a Consortium of Journals. Dryad UK aims to expand Dryad into the UK by establishing a UK mirror site and extending service to new publishers and disciplines.
Area/Discipline: Biomedical Sciences


Project name: EDgrid Central: Data Repository System for 3-D Full-Scale Earthquake Testing Facility
Institution/Funder/Manager: National Institute for Advanced Industrial Science and Technology, Japan
Project Description: A data repository system called EDgrid Central is designed for storing huge amount of experiment data by using a 3-D full-scale earthquake testing facility. The EDgrid Central prepares large storage capacity and implements a data modeling for the shake test in the backend. The frontend is a portal for users to retrieve the stored data by meta-data search and bulk download. This system uses the NEEScentral developed by the NEES project in the United States by enhancing search and download functionalities, according to the EDgrid users' requirements. The EDgrid Central allows facility sites to have a permanent repository of the shaking table experiment and it also enables civil engineering researchers to share their data and reports in their daily activities.
Area/Discipline: Geophysics


Project name: EIDCSR: Embedding Institutional Data Curation Services in Research
Institution/Funder/Manager: U Oxford, JISC
Project Description: The Embedding Institutional Data Curation Services in Research (EIDCSR) project aims to address the data management and curation requirements of three collaborating research groups in Oxford, by scoping their requirements and embedding selected elements of the digital curation lifecycle, including policy, workflow, and sustainability solutions within the research process. The workflows generated by the project are intended to scale to include other research domains and the outputs should be of use to other research intensive institutions. Project runs until Dec'10.
Area/Discipline: Medical & Life Sciences


Project name: ERIM: Engineering Research Information Management
Institution/Funder/Manager: U Bath, UKOLN, JISC
Project Description: ERIM aims to specify in practical terms how effective data management can be enabled and supported in research projects, particular to support reuse or more broadly what can be thought of as 're-purposing'. The project will look primarily at the engineering research domain.
Area/Discipline: Engineering


Project name: EURO VO: European Virtual Observatory
Institution/Funder/Manager: CNRS, ESO, INAF, U Edinburgh
Project Description: The Virtual Observatory (VO) is an international astronomical community-based initiative. It aims to allow global electronic access to the available astronomical data archives of space and ground-based observatories and other sky survey databases. It also aims to enable data analysis techniques through a coordinating entity that will provide common standards, wide-network bandwidth, and state-of-the-art analysis tools. The EURO-VO project aims at deploying an operational VO in Europe. Its objectives are the support of the utilization of the VO tools and services by the scientific community, the technology take-up and VO compliant resource provision and the building of the technical infrastructure.
Area/Discipline: Astronomy


Project name: FISHnet
Institution/Funder/Manager: Centre for e-Research, King’s College London, JISC
Project Description: Freshwater information sharing network
Area/Discipline: Freshwater Biology


Project name: HALOGEN - History Archaeology Linguistics Onomastics and GENetics
Institution/Funder/Manager: U Leicester, JISC
Project Description: The cross-disciplinary Roots of the British collaboration between scholars in humanities and genetics seeks to interrogate the evidence for the migration and/or continuity of human populations in the British Isles in the distant past. The HALOGEN project will support the data management needs of the researchers involved and thus establish organisational best practice in terms of data management planning and the support of diverse cross-disciplinary research data.
Area/Discipline: Ancient history/Genetics


Project name: I2S2
Institution/Funder/Manager: UKOLN/DCC/Soton/STFC, JISC
Project Description: Infrastructure for integration in structural sciences
Area/Discipline: Chemistry (with a view towards inter-disciplinary application)


Project name: Incremental: A step by step approach to informing, improving, & increasing research data curation practice
Institution/Funder/Manager: Cambridge University Library, Humanities Advanced Technology and Information Institute (HATII) at U Glasgow, DCC, JISC
Project Description: The aim of Incremental is to inform, improve and increase research data curation within UK HEIs, by providing exemplars and resources for others to use. Specific objectives are: (1) to investigate current practices and requirements at each institution; (2) to develop a plan for addressing these requirements; (3) to pilot tools and services at each HEI and then make further adjustments and recommendations; (4) embed the work within each institution; and (5) to deliver resources and findings to the DCC, DPC and JISC for wider dissemination. In addition to resources, the project will seek to provide information about their cost and sustainability.
Area/Discipline: Archaeology, Chemistry, English, Engineering and Medicine


Project name: IODP: Integrated Ocean Drilling Program
Institution/Funder/Manager: National Science Foundation (NSF), Japan’s Ministry of Education, Culture, Sports, Science and Technology (MEXT)
Project Description: IODP is an international marine research program that explores Earth's history and structure recorded in seafloor sediments and rocks, and monitors subseafloor environments. IODP builds upon the earlier successes of the Deep Sea Drilling Project (DSDP) and Ocean Drilling Program (ODP), which revolutionized our view of Earth history and global processes through ocean basin exploration.
The IODP oversees repositories around the world. Samples are distributed according to ODP and IODP policies.
Area/Discipline: Marine Sciences


Project name: MaDaM
Institution/Funder/Manager: Manchester eResearch Centre, JISC
Project Description: Pilot data management infrastructure for biomedical researchers
Area/Discipline: Biomedical Sciences


Project name: Managing Research Data: Gravitational Waves (MRD-GW)
Institution/Funder/Manager: STFC, University of Glasgow, JISC
Project Description: MRD-GW aims to examine the way in which Big Science data is managed, and produce recommendations as appropriate. Gravitational Wave (GW) data generated by the LIGO Scientific Consortium (LSC) will be used as a case-study.
Area/Discipline: Particle physics/Astronomy


Project name: PANGAEA
Institution/Funder/Manager: Alfred Wegener Institute for Polar and Marine Research (AWI), DFG
Project Description: Publishing Network for Geoscientific & Environmental Data
Area/Discipline: Earth Sciences


Project name: PEG-BOARD
Institution/Funder/Manager: School of Geographical Sciences, University of Bristol, JISC
Project Description: Palaeoclimate and environment data generation - building open access to research data
Area/Discipline: Palaeoclimatology


Project name: Quixote
Institution/Funder/Manager: U Cambridge/CSIC
Project Description: The main objective/vision of the Quixote project is to design, test and deploy a modular, open source system of tools that allow computational chemistry data (now sitting in the darkness of individual hard-disks) to be organized, shared, and queried
Area/Discipline: Quantum Chemistry


Project name: Research Data MANTRA
Institution/Funder/Manager: U Edinburgh/JISC
Project Description: Aims to develop open, online learning materials which reflect best practice in research data management grounded in three disciplinary contexts: social science, clinical psychology, and geoscience. The resulting materials will be embedded in three participating postgraduate programmes and made available through the Transkills programme for use by all postgraduate and early career researchers as well as made available generally through an open license. In addition to web-based 'chapters' that students can work through at their own pace, the course will include video interviews with leading academics about data management challenges, and practical exercises in handling data in four software analysis environments: SPSS, NVivo, R and ArcGIS.
Area/Discipline: Social and political science, Geoscience, Clinical psychology


Project name: SageCite: Citing network models of disease and associated data
Institution/Funder/Manager: UKOLN, U Manchester, British Library, JISC
Project Description: SageCite will develop and test a Citation Framework linking data, methods and publications. The domain of bio-informatics provides a case study, and the project builds on existing infrastructure and tools. Citations of complex network models of disease and associated data will be embedded in leading publications, exploring issues around the citation of data including the compound nature of datasets, description standards and identifiers.
Area/Discipline: Bioinformatics


Project name: ShareGeo Open
Institution/Funder/Manager: EDINA, JISC
Project Description: ShareGeo Open is a spatial data repository that promotes data sharing between creators and users of geospatial data
Area/Discipline: Geography


Project name: SPQR: supporting productive queries for research
Institution/Funder/Manager: KCL, U Edinburgh, Humboldt U Berlin, JISC
Project Description: The overall aim is to investigate the potential of linked data for integrating datasets related to classical antiquity, in particular addressing the particular challenges raised by our material – its incompleteness, uncertainty and fuzziness. We will achieve this by developing mechanisms for breaking data out of silos and exposing it as linked data, using standard ontologies, and in particular the Europeana Data Model, as the semantic “glue” for linking data into a wider network of knowledge. The ultimate objective will be to create a common corpus or “RDF warehouse” of linked Classics data that can be explored, searched and enhanced by further annotations.
Area/Discipline: Classics, Epigraphy and Archaeology


Project name: SUDAMIH
Institution/Funder/Manager: University of Oxford, JISC
Project Description: Supporting data management infrastructure for the Humanities
Area/Discipline: Humanities


Project name: TARDIS
Institution/Funder/Manager: Monash University, Australian National Data Service (ANDS), University of Sidney and some other Australian institutions
Project Description: TARDIS is a multi-institutional collaborative venture that aims to facilitate the archiving and sharing of raw X-ray diffraction images (collectively known as a 'dataset') from the protein crystallography community.
Area/Discipline: Crystallography


Project name: VAMDC Project: Virtual Atomic and Molecular Data Centre
Institution/Funder/Manager: EU, CNRS, CMSUC, UCL, OU, UNIVIE, UU, KOLN, INAF, QUB, AOB, ISRAN, RFNC-VNIITF, IAO, IVIC, INASAN
Project Description: VAMDC aims at building an interoperable e-Infrastructure for the exchange of atomic and molecular data. It embraces on the one hand scientists from a wide spectrum of disciplines in atomic and molecular (AM) Physics with a strong coupling to the users of their AM data (astrochemistry, atmospheric physics, plasmas) and on the other hand scientists and engineers from the ICT community used to deal with deploying interoperable e-infrastructure.
Area/Discipline: Astrophysics


Project name: WissGrid: Grid for Science
Institution/Funder/Manager: DFG, U Göttingen, Astrophysikalisches Institut (AIP), Alfred-Wegener-Institut (AWI), Deutsches Elektronen Synchrotron (DESY), Deutsches Klimarechenzentrum GmbH (DKRZ), Konrad-Zuse-Zentrum für Informationstechnik (ZIB), Universitätsmedizin Göttingen (UMG), Niedersächsische Staats- und Universitätsbibliothek (SUB), Technische U Dortmund (UDO), U Heidelberg, U Trier, U Wuppertal
Project Description: WissGrid’s objective is to establish long-term organisational and technical D-Grid structures for the academic world. WissGrid combines the heterogeneous needs from a variety of scientific disciplines and develops concepts for the long-term sustainable use of the organisational and technical grid infrastructure. In this context, the project aims to strengthen the organisational cooperation of scientists in the grid and to lower the entry barriers for new community grids.
Area/Discipline: Astrophysics, High Energy Physics, Climate Research, Medicine


Project name: XYZ Project
Institution/Funder/Manager: U Cambridge/IUCr/BioMed Central/Open Knowledge Foundation, JISC
Project Description: The XYZ Project will create a demonstrator of a new workflow for publishing data in support of full-text. The author prepares data for publication (if possible with validation) in a third-party trusted repository before the paper is submitted to a publisher. Our software will manage the deposition, release to reviewers, dis-embargo and for conventional publication or as a data journal
Area/Discipline: Crystallography


Besides this preliminary set of discipline-specific research data-related running projects -to be shortly enriched by Sonex with a complementary list of general purpose projects dealing with research data management- a thorough list of open data repositories for all areas may be found at the data repository section of the Open Access Directory (OAD).