ICOS

ICOS improved data lifecycle

Document

Deprecated document

Next version(s): 7EGkrJU-YW9tvWOT1nnFr8Pn
Latest version(s): wkXDf9kn5qAaKtt6EHQTr70M
10.18160/D2JV-KB6B (target, metadata)
11676/H_u3YXcPbs4XUKdnaQtv84rd (link)
ICOS improved data lifecycle.pdf

ICOS provides long term, high quality observations that follow (and cooperatively set) the global standards for the best possible quality data on the atmospheric composition for greenhouse gases (GHG), greenhouse gas exchange fluxes measured by eddy covariance and CO2 partial pressure at water surfaces. The ICOS observational data feeds into a wide area of science that covers for example plant physiology, agriculture, biology, ecology, energy & fuels, forestry, hydrology, (micro)meteorology, environmental, oceanography, geochemistry, physical geography, remote sensing, earth-, climate-, soil- science and combinations of these in multi-disciplinary projects.

As ICOS is committed to provide all data and methods in an open and transparent way as free data, a dedicated system is needed to secure the long term archiving and availability of the data together with the descriptive metadata that belongs to the data and is needed to find, identify, understand and properly use the data, also in the far future, following the FAIR data principles. An added requirement is that the full data lifecycle should be completely reproducible to enable full trust in the observations and the derived data products.

In this report we we define and describe the implemention of a comprehensive unified metadata flow from Thematic Centres to the Carbon Portal. The design criteria of this system were to integrate as much as possible the operational (legacy) database systems at the TCs with the data portal, thereby preserving the investments in the robust and proven QA/QC and database systems at the TCs and combining these with the benefits of a linked open data system with connected data licence check, usage tracking and dynamic machine operable data and metadata based on a versioned RDF triple store.

Also we developed a connected DOI minting system, implemented the generation of data collections and a linked system for versioning of the data, all connected to the ontology driven single point of ingestion, optimised for machine to machine communication. This has been used incrementally in full operational mode over the last years and is now in place and used by all ICOS domains for all data streams, from raw data through near-real-time to final quality controlled data, and by the external users that provide elaborated products.

The licence check and data usage tracking has been implemented in a completely unobtrusive way and is flexible enough to be started to interoperate with major data portals like those of FLUXNET, NEON, SOCAT and WMO WDCGG. The use of DOIs increases the exposure of the ICOS data to global and European data portals like the future EOSC portal and current OpenAIRE portal and Google Dataset Search. The ICOS data is already finding it way to many users and the growing length of the ICOS timeseries in all domains and the interoperation with the global portals this data use of ICOS data can now grow further optimally.

2020
ICOS ERIC
Data management, FAIR
Vermeulen, A., Hazan, L., Pfeil, B., Lankreijer, H., Hellström, M., Mirzov, O., D'Onofrio, C., Rivier, L., Jones, S., Papale, D., Juurola, E., 2020. ICOS improved data lifecycle. ICOS ERIC. https://doi.org/10.18160/D2JV-KB6B
BibTex
@article{https://doi.org/10.18160/d2jv-kb6b,
  doi = {10.18160/D2JV-KB6B},
  url = {https://meta.icos-cp.eu/objects/wkXDf9kn5qAaKtt6EHQTr70M},
  author = {Vermeulen, Alex and Hazan, Lynn and Pfeil, Benjamin and Lankreijer, Harry and Hellström, Margareta and Mirzov, Oleg and D'Onofrio, Claudio and Rivier, Leo and Jones, Steve and Papale, Dario and Juurola, Eija},
  keywords = {Data management, FAIR},
  title = {ICOS improved data lifecycle},
  publisher = {ICOS ERIC},
  year = {2020},
  copyright = {CC0}
}
RIS
TY  - RPRT
T1  - ICOS improved data lifecycle
AU  - Vermeulen, Alex
AU  - Hazan, Lynn
AU  - Pfeil, Benjamin
AU  - Lankreijer, Harry
AU  - Hellström, Margareta
AU  - Mirzov, Oleg
AU  - D'Onofrio, Claudio
AU  - Rivier, Leo
AU  - Jones, Steve
AU  - Papale, Dario
AU  - Juurola, Eija
DO  - 10.18160/D2JV-KB6B
UR  - https://meta.icos-cp.eu/objects/wkXDf9kn5qAaKtt6EHQTr70M
AB  - ICOS provides long term, high quality observations that follow (and cooperatively set) the global standards for the best possible quality data on the atmospheric composition for greenhouse gases (GHG), greenhouse gas exchange fluxes measured by eddy covariance and CO2 partial pressure at water surfaces. The ICOS observational data feeds into a wide area of science that covers for example plant physiology, agriculture, biology, ecology, energy & fuels, forestry, hydrology, (micro)meteorology, environmental, oceanography, geochemistry, physical geography, remote sensing, earth-, climate-, soil- science and combinations of these in multi-disciplinary projects.

As ICOS is committed to provide all data and methods in an open and transparent way as free data, a dedicated system is needed to secure the long term archiving and availability of the data together with the descriptive metadata that belongs to the data and is needed to find, identify, understand and properly use the data, also in the far future, following the FAIR data principles. An added requirement is that the full data lifecycle should be completely reproducible to enable full trust in the observations and the derived data products.

In this report we we define and describe the implemention of a comprehensive unified metadata flow from Thematic Centres to the Carbon Portal. The design criteria of this system were to integrate as much as possible the operational (legacy) database systems at the TCs with the data portal, thereby preserving the investments in the robust and proven QA/QC and database systems at the TCs and combining these with the benefits of a linked open data system with connected data licence check, usage tracking and dynamic machine operable data and metadata based on a versioned RDF triple store.

Also we developed a connected DOI minting system, implemented the generation of data collections and a linked system for versioning of the data, all connected to the ontology driven single point of ingestion, optimised for machine to machine communication. This has been used incrementally in full operational mode over the last years and is now in place and used by all ICOS domains for all data streams, from raw data through near-real-time to final quality controlled data, and by the external users that provide elaborated products.

The licence check and data usage tracking has been implemented in a completely unobtrusive way and is flexible enough to be started to interoperate with major data portals like those of FLUXNET, NEON, SOCAT and WMO WDCGG. The use of DOIs increases the exposure of the ICOS data to global and European data portals like the future EOSC portal and current OpenAIRE portal and Google Dataset Search. The ICOS data is already finding it way to many users and the growing length of the ICOS timeseries in all domains and the interoperation with the global portals this data use of ICOS data can now grow further optimally.
KW  - Data management
KW  - FAIR
PY  - 2020
PB  - ICOS ERIC
ER  -
2 MB (1716168 bytes)
1ffbb761770f6ece1750a767690b6ff38add437c1d6b5982be201f07f81a3606
H/u3YXcPbs4XUKdnaQtv84rdQ3wda1mCviAfB/gaNgY

Submission

2022-02-09 17:25:23
2022-02-09 17:25:21

Statistics

35