
How the data you collect fuels collective knowledge: the FAIR data journey
LandSeaLot is working with citizen scientists and local communities to collect and share data about land-sea interface areas such as river estuaries, deltas, beaches and bays. These areas are crucial to life on earth and the livelihoods of billions of people. To achieve these goals, our communities are testing and using low-cost sensors: small devices that can be used to collect coastal data in accessible and affordable ways.
Collecting coastal data is only the beginning! LandSeaLot is committed to making data FAIR: Findable, Accessible, Interoperable, and Reusable for all. Findable in that interested users can easily discover data that may
be relevant to them. Accessible in that once found, users are able to open, download and use data as they need. Interoperable, meaning the ability to correctly interpret data across different systems, machines or organisational boundaries, ensuring data are in compatible formats even if from different sources and of different types. And reusable meaning data can be used again and again for as long as needed.
Let’s take a closer look at what happens to the data to ensure they are correctly managed and made available to all.
Data and metadata are collected and quality controlled
Data are collected in coastal areas to gather information on specific parameters like water temperature, water quality or salinity. At the same time, metadata are collected. Metadata provide extra information about the data such as the date, time, depth, exact location, unit of measurement, type of device used, the data file format, and ideally about the level of quality control (QC). In some cases, QC is only applied at a later stage. To use an analogy, data is like the music you listen to, and metadata is the track listing, artist information, credits and other information that comes with it. The more metadata and the clearer the content, the more valuable and trustworthy the data are for others.
Metadata needs to observe standardised terms and formats used in the marine data domain to allow automated systems like search engines to correctly identify and categorise data: (making it Findable and Accessible) and ensure they can be accurately interpreted across different systems or organisations (Interoperable and Reusable).
Data are stored locally & shared
After being stored locally, the collected data with associated metadata are ready for publishing to potential “first users.” There are several paths for achieving this:
• Files can be shared on a file server, in the cloud, or on a physical machine for other people to view and download;
• Data can be shared by the developers of the sensor or device that was originally used to collect the data;
• Data can be made accessible via an ERDAPP server, or other compliant technology. Learn more about the technical details of ERDAPP here.
You can think of ERRDAP as a streaming service of oceanographic data. It offers tools to make data Findable and Accessible for both humans and machines, offering on-demand access to data and letting users pick and choose the data you want to view. You can filter by time, location, or specific measurements to access data in many different file types.
Data are organised & combined for a larger community
Beyond the direct use of the data via servers like ERDDAP, which make oceanographic data available to first users, oceanographic data are incredibly valuable for a larger European community of researchers, students, consultants, businesses and decisionmakers working on a multitude of issues relating to the marine environment, blue economy and other domains.
Before making data available to these users, they need to be quality controlled by specialised data centers and then made available (quality controlled) to aggregators at the European level. Aggregators organise their networks of national data centers and offer oceanographic data according to themes (such as EurOBIS for biodiversity data or SeaDataNet for delayed mode physical/chemical data). The aggregators bring together high quality marine data from hundreds of different sources across Europe – including national hydrographic offices, research institutes, environmental agencies, private companies and citizen science initiatives; collecting fragmented data that might otherwise be hidden or inaccessible.
Data Quality Control: A Deep Dive
An important part of metadata communicates the quality of the data. For quality control (QC) a flagging system is important to ensure that users understand the trustworthiness of the information they’re using. The following QC flagging system, based on the UN’s International Oceanographic Data and Information Exchange (IODE), can be used for data collected from coastal observations:
- 0 (Not Evaluated / Not Applicable): The QC test was not performed, or the data point isn’t relevant to a particular test.
- 1 (Good / Pass): The data point is considered high quality and has passed all relevant QC tests.
- 2 (Unknown / Not Available): Quality was not evaluated, or the information is unavailable.
- 3 (Questionable / Suspect): The data point is suspect, potentially unusual, or might require further review. It hasn’t definitively failed but warrants caution.
- 4 (Bad / Fail): The data point is considered erroneous and has failed one or more QC tests.
- 9 (Missing Data): The data value is absent or missing.
The quality of the data could be affected by many factors, including the precision of the sensor used to collect it, the procedures followed when deploying instruments or collecting samples, as well as errors, faults or malfunctions.
Data that do not entirely meet the ‘Good/Pass’ mark, can still be valuable. Data processing and correction techniques can be applied in some cases, and for some users and applications lower quality data may be sufficient for their needs. However, complete metadata are essential for users to know and understand data quality, allowing them to decide if and how they want to use them.
Data are made available via EMODnet: The European Marine Observation and Data Network (EMODnet)
EMODnet is a long-term EU service that collects, standardises, and harmonises marine in situ data and makes them publicly available as data products with underlying observation datasets. In situ refers to data that are collected at the location that is being studied, in the water (in contrast to, for example, observations obtained from satellites or drones). EMODnet provides high quality, reliable datasets for a diverse and democratic user base across multiple sectors. These data originate from thematic data aggregators such as EurOBIS and SeaDataNet, as well as from national resources that collect data on, for example, human activities.
Once marine data are included in EMODnet, they are open and accessible to anyone for any use, according to the FAIR principles. EMODnet data also feed into EDITO: the core infrastructure of the European Digital Twin of the Ocean. EDITO makes world-class marine data freely available to all, and allows interested parties to develop digital twins, explore ‘what-if’ scenarios and support science-based decision-making to better address challenges like climate change, pollution and biodiversity loss. Read more about EDITO here.
Data are reused
When marine data are FAIR and available to everyone, more people can benefit from high-quality data and shared knowledge that informs sustainable actions and decision making on a local, regional and international level. The core purpose of data aggregation platforms is to break down “data silos” – where information is scattered, fragmented and incompatible – by aggregating, harmonising and standardising data. They also perform quality control checks, create derived data products and enhance accessibility via user-friendly interfaces, visualisations and diverse output formats. The data that researchers, citizen scientists and local communities collect have a crucial role to play in contributing to EMODnet data products!
Comments are closed