Cataloguing Scientific Datasets and Metadata using SciCat
SciCat is a database and web application for managing scientific datasets and their associated metadata according to FAIR principles (findable, accessible, interoperable, reusable). At the Franklin, we have multiple scientific instruments which generate data that is being automatically indexed by SciCat to enable data capture and sharing. This benefits our scientific users by giving them a single location for discovering scientific data generated at the Franklin, downloading it and sharing it with partners.
We at the Franklin are contributing to the development of the SciCat project in a number of ways. We have two developers who regularly attending the development meetings. We have contributed to testing the software and refining new features, most recently surrounding the migration of the backend to a new technology stack and testing authentication with Keycloak OpenID Connect (OIDC) for single-sign on. We are also contributing to the related PySciCat project, which is a Python interface and API for SciCat, and we have recently created a tool for exporting key metrics from SciCat (such as number and size of datasets by user, instrument and principal investigator) on an ongoing basis, built as a Prometheus exporter. We are also contributing to making SciCat more scalable, for example by developing Kubernetes manifests.
In the future, we would like to further contribute to SciCat by moving it to a microservices architecture that will support the large volume of scientific data being generated at the Franklin, and enable more datasets to be accessible by the scientific community and the general public.