Overview and details of the sessions of this conference. Please select a date or room to show only sessions at that day or location. Please select a single session for detailed view (with abstracts and downloads if available).
P4A: Managing Research (and Open) Data
[24x7] Revisiting Self-Deposit of Scientific Data
Stanford University, United States of America
Sharing scientific data is increasingly valuable for reproducible science, furthering investigation, and innovation. To this end, repositories facilitate data sharing by making scholarly data available. We are at an impasse, however. Librarian-mediated approaches to self-deposit of scientific data are very resource-intensive, and the repository services provided to researchers are often limited. Self-deposit is quite a challenging use case as it encompasses data preparation, metadata description, upload, visualization, annotation, sharing, publication, access, rights, preservation, citation, and discovery services. This editorial suggests we revisit the value proposition we make for self-deposit and mitigate its resource-intensive workflows.
[24x7] CERN Open Data and Data Analysis Preservation
We present newly launched CERN Open Data Portal and related long-term Data Analysis preservation activities. Using the Invenio digital library platform and taking inspiration from OAIS preservation practices, the knowledge associated with successive data analysis steps is being captured for further reuse. The aim is to preserve not only information about research datasets, but also about the underlying user software and virtual machine platforms used to study it, together with any configuration parameters and high-level physics information associated with the analysis process. The CERN Open Data portal disseminates selected primary and reduced datasets of LHC experiments and offers several high-level tools permitting general public and general data scientists to visualise and further work with the data, such as interactive event display or histogram plotting interfaces. The ultimate goal of data analysis preservation efforts is to be able to reproduce an analysis even many years after its initial publication, permitting to extend the impact of preserved analyses through their future revalidation and recasting.
[24x7] Integration and Adoption: An ORCID story
Symplectic, United Kingdom
As academic engagement with institutional repositories moves from “why should I do this?” to “good idea, but how can the Library make this easier for me?”, the need for consistent and unambiguous metadata has never been greater.
Metadata consistency includes the unambiguous identification of authors, editors, supervisors and other contributors to repository objects but until the launch of ORCID, there wasn’t a common means of unambiguously identifying authors.
In this presentation, we will explain how Imperial College London - the first institution to integrate a research information management system with an institutional repository - enabled over 1,200 research active staff within a week to claim an ORCID and subsequently automate the harvest of data from ORCID helping to populate Imperial’s institutional repository with verified metadata.
Islandora as an access system for iRODS managed information packages
Zuse Institute Berlin (ZIB), Germany
Accessing information packages with Islandora is straight forward, albeit not so much when they reside within a federated data management environment. In our case, dissemination information packages live in the Fedora object store for immediate access. The archival information packages are stored safely in a hierarchical storage infrastructure managed by iRODS and are only accessible for administrative and preservation action purposes. We present a data model that supports both use cases utilizing just a single Islandora instance. To integrate with iRODS, we developed an Islandora module to display and deliver data and metadata from the storage location. This solution also allows us to extend the system with further preservation workflow actions that will be required in the future.
Databrary: A research-centered repository for video data
New York University, United States of America
As a research data repository, Databrary focuses specifically on the storage, discoverability, and sharing of video-based datasets within the developmental and learning sciences. Storing video presents its own unique opportunities and challenges, the latter of which include research subject privacy and difficulties in creating and storing metadata that comes from different research projects in a standardized fashion. Databrary has implemented policies and practices within a functioning web application that meets both the needs of researchers as well as the preservation and access needs to share these datasets into the future. The lessons learned thus far in developing Databrary stand to model a viable approach to establishing practices and workflows for gathering and organizing research data that lift the burden off of researchers and also have potential to feed into established library systems for broader findability and accessibility.
The Hydra Common Data Model
1University of California, San Diego; 2Stanford University; 3Princeton University
One of the many successes of the Hydra community is the fundamental notion from which its name is derived—the concept of many interfaces (“heads”) over top of a single repository (the “body”). The recent release of Fedora 4, with its internal RDF-centric model, has spurred efforts for a community-wide model of collections and works, such that the heads can be sure that the body will behave as they expect it to. That model has been designed and vetted by the Hydra community, and its architecture and initial implementations will be presented in this paper.