Home About The Rosalind Franklin Institute Data Management Plan

This data management plan is for data produced by science funded by UK academia. For industrial data management please contact info@rfi.ac.uk. 

Definitions:

The term raw data refers to data collected from experiments performed on the Franklin instruments.

The term analysed data pertains to the data obtained by processing raw data with third-parties data analysis software either manually or automatically.

The term metadata describes information referring to data collected from instruments or data generated by data analysis software, including (but not limited to) the context of the experiment, the experimental team, experimental conditions, electronic logbooks   generated   during   the   experiment   and   other   logistical information.

The term users refer to the members of experimental teams, which are part of a Franklin project.

The term metadata catalogue refers to a database of metadata containing links to raw or analysed data files, that can be accessed by a variety of methods, including (but not limited to) web-based browsers on desktop and mobile devices.

The term Open Access means available freely for use by anyone without fees, copyright or patent restrictions.

Data collection:

We collect data from scientific instruments and laboratories in the Franklin Hub and spoke institutes. We generate further data through analysis and modelling. We accommodate most formats that are generated by third-party instrument control software, but we recommend using standard open formats for the scientific community, e.g. .mrc for Cryo-EM, where possible.

Documentation and metadata:

Currently we create basic metadata  for experimental data automatically. We recommend that users improve the basic metadata through linking to project information, funding bodies and adding extra scientific metadata. This can be done through our catalogue manually, and we are in the process of automating the links to our project management system.

Structural and administrative data is common across all projects. The scientific metadata changes dependent on project. We can collaborate to automate scientific metadata collection.

Basic metadata captured at the Franklin includes: Dataset Name, Data Type (raw/processed), creation time, Owner, Principal Investigator, ORCID of owner, contact email, data ownership group, data access group, source folder, location of data on host, size, data format, creation location, technique, size of dataset, number of files.

Ethics and legal compliance:

This is dependent on the funding and nature of the project.

The data is held in accordance with GDPR.

Data Storage and Security:

We operate the following storage model: scratch, campaign, and archive. We use object store for our campaign storage, which is a secure, resilient store, capable of holding large amounts of data that can be readily accessed. The data is only kept in one cluster and there is currently no copy maintained elsewhere. The data is read-only except for the data administrators, who will only access it to perform their duties.

We have a data access model which is controlled via Identity and Access Management systems (IAMS).

A metadata catalogue is used to store associated research metadata.

Selection and preservation:

We use the UKRI Concordat for Open Research Data as the basis for the preservation of data. All raw data is automatically saved in the campaign store. The data is kept for 10 years, or for the duration of the project if it exceeds this.

All raw data is automatically saved to the campaign store. The analysed data is uploaded by the user, therefore here they are selecting what is of value.

Once a project is closed, we will back up data to tape and submit it to the archive.

Data sharing:

Currently data is shared on an ad-hoc basis dependent on the collaboration agreement of the project.

All data access is done via secure mechanisms e.g. https and Globus (GridFTP) each with an appropriate authentication and authorization mechanisms.

In future we will provide a mechanism to publish data for Open Access.

The Franklin allows data to be uploaded to external community data repositories.

If an external user requires data that is not Open Access, they need to enter into a collaboration agreement with the Franklin.

Responsibilities and resources:

This is implemented on a best-efforts basis by the AI Core Team. Users are responsible for working with the team to make sure their data and metadata are captured and correct.

Rosalind Franklin Institute