Articles | Volume 2-oae2023
27 Nov 2023
 | OAE Guide 2023 | Chapter 13
 | 27 Nov 2023 | OAE Guide 2023 | Chapter 13

Data reporting and sharing for ocean alkalinity enhancement research

Li-Qing Jiang, Adam V. Subhas, Daniela Basso, Katja Fennel, and Jean-Pierre Gattuso

Effective management of data is essential for successful ocean alkalinity enhancement (OAE) research, as it guarantees the long-term preservation, interoperability, discoverability, and accessibility of data. OAE research generates various types of data, such as discrete bottle measurements, autonomous measurements from surface underway and uncrewed platforms (e.g., moorings, Saildrones, gliders, Argo floats), physiological response studies (e.g., laboratory, mesocosm, and field experiments, and natural analogues), and model outputs. This paper addresses data and metadata standards for all these types of OAE data. As part of this study, existing data standards have been updated to accommodate OAE research needs, and a completely new physiological response data standard has been introduced. Additionally, an existing ocean acidification metadata template has been upgraded to be applicable to OAE research. This paper also presents controlled vocabularies for OAE research, including types of OAE studies, source materials for alkalinization, platforms, and instruments. These guidelines will aid OAE researchers in preparing their metadata and data for submission to permanent archives. Finally, the paper provides information about available data assembly centers that OAE researchers can utilize for their data needs. The guidelines outlined in this paper are applicable to ocean acidification research as well.

1 Introduction

Data management plays a crucial role in bridging the gap between field observations and subsequent research based on these data (Brett et al., 2020). It is an essential component of ocean alkalinity enhancement (OAE) research to help evaluate its potential environmental risks and co-benefits, understand its effectiveness and scalability, and support its measurement reporting and verification (MRV) efforts for carbon credit accounting. Specifically, effective data management enables long-term preservation of data, ensures compliance with uniform metadata and data standards, facilitates interoperability and compatibility, and enables data discovery and access (de La Beaujardière et al., 2010).

Long-term preservation can be achieved by publishing data in archives and preserving them in non-proprietary, archivable formats to ensure accessibility and retrievability for extended periods of time, spanning decades to even centuries. This helps prevent data loss or degradation caused by technological obsolescence, human errors, natural disasters, or other factors. Datasets, unlike journal publications, are frequently revised or updated after they are released. This may occur as a result of additional quality control (QC) or the acquisition of additional data or metadata. While ensuring access to the latest version of a dataset is crucial, preserving previous versions is equally important. All historical versions should be retained on a permanent basis. Otherwise, research based on previous iterations of a dataset may become unverifiable.

Data standards are a set of rules and specifications that define how data should be stored, structured, and formatted (Berman and Fox, 2013). Their purpose is to promote consistency and interoperability, reducing ambiguity in data exchange and interpretation. In oceanographic studies, data standards cover elements, such as the technical format for storing data, e.g., Microsoft Excel, Comma Separated Values (CSV), or NetCDF; standardized column header abbreviations and units; standardized methods for calculating certain variables; and missing value indicators. It is worth noting that the new XLSX format is based on OpenOffice XML, unlike the prior binary-based proprietary format of XLS. As a result, it is no longer a proprietary format. By adhering to these standards, researchers can ensure that their data are organized, structured, and formatted in a way that allows for easy sharing, interpretation, and reuse.

Metadata refer to structured information that provides context and details about a dataset, such as its title, authors or creators, observed properties, instruments used, measurement and calibration details, uncertainty, and relevant keywords (Guenther and Radebaugh, 2004). It is often defined as data about data. Metadata serve two main purposes: first, they provide users with detailed descriptions about a dataset, helping them understand it; second, they offer search keywords that make the dataset findable and retrievable. Overall, metadata are a crucial aspect of data management and are essential for subsequent data use.

Controlled vocabularies are defined as lists of pre-defined and standardized terms (Zeng and Qin, 2008). The use of controlled vocabularies plays a very important role in effective data management, as it helps ensure that the data are documented, findable, and accessible in a consistent way. By using a limited and standardized set of terms, controlled vocabularies help improve metadata interpretation and data findability by eliminating spelling variations, synonyms, and other forms of variability. Additionally, controlled vocabularies help facilitate metadata interoperability between different systems, making it easier to exchange and integrate data between different organizations and platforms.

Data citation involves referencing a dataset for the purpose of attributing credit and facilitating access (TGDCSP, 2013). It not only enables data users to acknowledge and give credit to the producers of a dataset used in a research study or project, but also allows readers to access and use the dataset for additional research. Data citation plays an important role in promoting scientific reproducibility and accountability and facilitating data sharing and reuse. As data sharing becomes more prevalent, data citation is increasingly important for tracking the impact of datasets and ensuring that research is built on a strong foundation of credible and transparent data.

In essence, data management is a service aimed at fulfilling the data needs of the research community. Therefore, efforts to establish best practices, such as the creation of new data and metadata standards and controlled vocabularies, should be driven by the needs and preferences of the research community. It is equally important for researchers to adhere to these guidelines when preparing high-quality metadata and data packages for submission to appropriate repositories. While this paper also sheds light on recommendations and requirements for data assembly centers to build customized data management systems that meet the data needs of the OAE research community, the presented data and metadata standards primarily serve submission purposes. During the development of these data and metadata standards, we ensured that they have a wide range of applications in other research fields, including ocean acidification (OA).

2 Data standards

Ocean alkalinity enhancement research encompasses a wide range of topics, resulting in various types of data. These different types of OAE data can be classified into four categories: (a) discrete bottle-based measurements; (b) autonomous measurements from surface underway (e.g., surveys conducted on ships of opportunity, or SOOP), time series (e.g., moorings), and uncrewed platforms (e.g., Saildrones, gliders, Argo floats); (c) physiological response studies, including laboratory, mesocosms, field experiments, natural analogues, and more broadly biological and geochemical experimental studies; and (d) model outputs (Table 1). To ensure consistency and interoperability, it is recommended to use uniform data standards for each type of OAE data (Tanhua et al., 2019; Brett et al., 2020). For category (b), two data standards are available: one for surface underway measurements and the other for autonomous sensor data from uncrewed platforms, including moorings, Saildrones, gliders, Argo floats, etc. This is because the measurement of one of the key variables, the carbon dioxide fugacity (fCO2), involves the use of two different systems, depending on whether it is monitored during underway operations or time-series mooring. Note that other communities may use FCO2 to refer to the flux of CO2 across an interface (e.g., sea–air interface, biosphere–atmosphere interface). It is recommended to use an italicized f for fugacity and a capital F for fluxes. Category (c) may also include abiotic responses, such as (but not limited to) saturation state thresholds for calcium carbonate precipitation, mineral dissolution rate studies, and CO2 uptake efficiency determinations. In these cases, inorganic variables associated with these data standards should be sufficient to capture all of the relevant study details.

Table 1Proposed data standards for the purpose of submitting common types of OAE data. A CTD rosette consists of a metal frame that houses a collection of sensors and water sampling bottles. The abbreviation CTD stands for conductivity, temperature, and depth, which are the three primary variables measured by a CTD sensor. Furthermore, the rosette frame can accommodate additional sensors to measure various oceanographic variables such as oxygen, chlorophyll a, etc.

Download Print Version | Download XLSX

Table 1 presents a list of recommended data standards for each type of the OAE data as mentioned above. This table serves as a reference for researchers and data managers to ensure that their data meet the required standards for long-term preservation, interoperability, and reuse. For all data standards, users may remove irrelevant columns and add necessary ones. The data standard for discrete bottle based observations is described in detail by Jiang et al. (2022). The data standards for surface underway and autonomous sensor measurements are an update to what the community has been using over the last several decades.

The data standard for physiological response OAE studies is developed as part of this study, covering laboratory experiments, mesocosms, field experiments, and natural analogues, and more broadly biological and geochemical experimental studies. It emphasizes the experimental setup while allowing users to document their own response variables. For biological variables, it is important to state the taxonomy (a taxon or a community) upon which the variable is studied. For example, if the growth rate of a certain species of salmon is studied. The “variable/parameter” is growth rate, and “biological subject” is that species of salmon. One could group/capture organismal data in three forms: taxonomic, functional, and phylogenetic. It is recommended to use the species reference databases from the Catalogue of Life (, last access: 5 November 2023), Integrated Taxonomic Information System (or ITIS,, last access: 5 November 2023), World Register of Marine Species (or WoRMS,, last access: 5 November 2023), or Paleobiology Database (PBDB,, last access: 5 November 2023). For life stages, consider using an existing controlled vocabulary like (last access: 5 November 2023).

Model outputs often involve extensive data volumes, reaching gigabytes or even terabytes, making it necessary to address standards for the operational provision of model data (i.e., making data available for weeks to years), separately from long-term or permanent archiving. The operational provision of model output data typically relies on three integrated standards: network Common Data Form (netCDF) files, the Climate and Forecast (CF) metadata conventions, and the Open source Project for a Network Data Access Protocol (OPeNDAP) libraries for remote data access. NetCDF is an open-source software that has been developed and supported by the University Corporation for Atmospheric Research's Unidata program (, last access: 5 November 2023) since 1989. NetCDF enables the creation and dissemination of self-contained data files with metadata, using formats that are independent of any specific machine or system. It has long been a standard for generation of model outputs and climatological data products in the ocean and climate modeling communities. The CF metadata conventions provide guidelines for encoding datasets in netCDF, specifying the reporting of space and time coordinates, units, variable names, and other relevant information (Hassell et al., 2017). CF-compliant netCDF files are advantageous due to their self-describing nature, eliminating the need for additional information to interpret their contents. CF is a living and open standard that encourages community participation in proposing enhancements and reporting issues (, last access: 5 November 2023). OPeNDAP, which is based on the Data Access Protocol (DAP), allows remote access to CF-compliant netCDF files stored on web servers through a set of libraries, making compliant datasets highly interoperable and findable. Furthermore, it enables users to request subsets of data without the need to transfer potentially very large files when only a subset is of interest. Together, the netCDF–CF–OPeNDAP standard provides a high level of readability and interoperability for model outputs, gridded data products (e.g., satellite observations), and ocean observations (e.g., Argo). The evolution of these standards and their community-wide acceptance are discussed in Hankin et al. (2010).

Box 1Metadata elements that should be included in netCDF files generated out of a biogeochemical ocean model. Information specific to OAE studies is indicated in italic font. Refer to Fennel et al. (2023, this Guide) for more context.​​​​​​​


The netCDF–CF–OPeNDAP standard enables provision of model outputs in accordance with the FAIR principles (Wilkinson et al., 2016), provided a few conditions are met. NetCDF–CF–OPeNDAP datasets can be findable because machine-readable metadata enable automatic discovery, accessible because of the standardized communications protocol that is open and universally implementable, interoperable because of the standardized, machine-readable metadata and data and the ability to subset and aggregate datasets, and reusable because rich metadata using standardized naming conventions can be provided. The necessary conditions for a netCDF–CF–OPeNDAP dataset to qualify as FAIR are that (a) it is openly available and has a globally unique and persistent identifier (e.g., a digital object identifier, or DOI), (b) data and metadata are registered and indexed in a searchable resource, and (c) data are described with rich metadata that include accurate and relevant attributes and remain accessible even if the data are no longer available. Box 1 lists attributes that should be included in netCDF files generated out of a biogeochemical ocean model, including several that are specific to OAE research, for the output to be considered a richly documented dataset. Output from an Earth system model would have slightly different requirements regarding the atmosphere (e.g., atmospheric forcing would not apply).

The discussion thus far has focused on the operational provision of model outputs, i.e., comprehensive datasets that may be available for periods of weeks to years. However, because of their large data volume, they are not amenable to long-term or permanent archiving. Nevertheless, long-term archiving of model-related information in some form that makes datasets reproducible is required but not yet done routinely. We suggest the following as a best practice:

  1. Metadata should be permanently archived even for operational datasets (as mentioned above, this is required for a dataset to qualify as FAIR).

  2. Essential subsets of operational datasets should be permanently archived, although it may not be immediately clear what these subsets should encompass. At a minimum, data subsets that would are required to support conclusions in publications should be archived.

  3. Model code should be permanently archived (e.g., Git versions with DOI), and sufficient metadata should be provided so that investigators can reproduce all model inputs (including initial and boundary conditions, model parameters). This information should allow, in principle, the reproduction of large model output datasets that cannot be permanently archived.

3 Metadata template

Section 2 highlights the importance of including some specific metadata information in netCDF files generated out of ocean model outputs. Apart from fulfilling documentation purposes, such information plays a vital role in facilitating data discovery when utilizing the netCDF–CF–OPeNDAP standard for operational provision of model output data. However, for long-term archiving purposes, data assembly centers commonly implement an independent and comprehensive metadata template. Ideally, these templates should be universally applicable to all data holdings, ensuring comprehensive documentation and accurate discoverability.

Jiang et al. (2015) described a metadata template that can be universally applied to all major types of ocean acidification (OA) data. Its development was driven by the need to document laboratory experiments to study the physiological responses of OA, which was a relatively new type of research at the time. The template benefited from the rich metadata management experiences of the Ocean Metadata Editor (OME) as used by the Carbon Dioxide Information Analysis Center (CDIAC, Oak Ridge, Tennessee, USA). This is especially true for some of the metadata elements associated with ocean carbon parameters, e.g., carbon dioxide fugacity (fCO2). It features a “variable metadata section”, which allows the documentation of all ancillary metadata information of an observed oceanographic variable (e.g., its variable abbreviation, full name, unit, instruments, uncertainty) to be organized around the variable, thus enabling the documentation of rich metadata information for all observed properties. In addition, new metadata elements (e.g., observation type, in situ observation/manipulation condition/response variable, measured or calculated, biological subject, species identification code, life stage) were introduced. As the template was being developed, a bottom-up approach was adopted, and the authors worked with numerous OA scientists from around the world to ensure the produced template conforms to the needs and preferences of the research community. Figure 1 shows a diagram of the relationships between tables to help facilitate the navigation of the many groups of information.

Figure 1A diagram showing the relationship between Tables 2, 3, 4, 5, 7, and 8.


Table 2Selected components of the new metadata template. ICES is short for the International Council for the Exploration of the Sea ( For the latest version of the metadata template, refer to “NA” is short for “not available”.

Download XLSX

In this paper, an updated metadata template (Version 2.0) is presented to accommodate the documentation of data coming out of OAE research (Table 2). We note that OAE research, while historically linked to acidification research, is distinct in its application and may require additional parameters more akin to those found in iron fertilization or other perturbation studies. The revised template specifically allows users to indicate the type of the OAE study and indicate whether its treatment type is for future ocean acidification conditions or for ocean alkalinity enhancement experiments. It also has a new element for the name of the model. For the “people” sections, the address field is split up into road address, city, state/province, zip code, and country for better machine readability. The original title and abstract are replaced with “dataset title” and “dataset description”, respectively, to make them distinguishable from the title and abstract of a peer-reviewed publication. The names of some other metadata elements were also changed to make them more self-explanatory. A new metadata element called platform type, which is backed with controlled vocabularies, is added to allow data users to filter the datasets based on the type of the specific observing platform. For example, in the future, a user would be able to search for only Saildrone uncrewed surface vehicle (USV)-based measurements. For the funding information section, two new elements about the start date and end date of the project are added. Most importantly, an element is added to enable multiple datasets generated out of a research expedition or experiment to be linked to each other. Terms that were either obsolete or rarely used (e.g., spatial reference system, purpose, section) were discarded.

Dataset title is a very important element of the metadata. It is often one of the few pieces of information a user can see in the search results. Thus, it is critical for data producers to create titles that are descriptive. It is recommended to follow the template of [observed properties] collected from [observation categories] using [instruments] from [research vessels or other platforms] in [sea names] during [research projects] from [start date] to [end date]. Here is one example: Dissolved inorganic carbon, total alkalinity, pH, temperature, salinity and other variables collected from profile and discrete sample observations using CTD, Niskin bottle, and other instruments from R/V Wecoma in the U.S. West Coast California Current System during the 2011 West Coast Ocean Acidification Cruise (WCOA2011) from 2011-08-12 to 2011-08-30.

Dataset description is similar to the abstract of a publication, encompassing essential information on data collection and generation methods, the variables and attributes present in the dataset, as well as any limitations or restrictions on data usage. Moreover, it may provide instructions on accessing and utilizing the data. Here is an example of a well-crafted dataset description: This dataset contains discrete bottle (CTD profile) data of the first West Coast Ocean Acidification cruise (WCOA2011). The cruise took place aboard R/V Wecoma from August 12 to 30 in 2011. Ninety-five stations were occupied from northern Washington to southern California along 13 transect lines on the west coast of the United States. At all stations, CTD casts were conducted, and discrete water samples were collected with Niskin bottles. Inorganic ocean carbon variables, including dissolved inorganic carbon (DIC), total alkalinity (TA), pH, as well as dissolved oxygen, and nutrients (silicate, phosphate, and nitrate) were measured. The cruise was designed to obtain a synoptic snapshot of key carbon, physical, and biogeochemical parameters as they relate to ocean acidification (OA) in the coastal realm. During the cruise, some of the same transect lines were occupied as during the 2007 West Coast Carbon cruise, as well as many CalCOFI stations. This effort was conducted in support of the coastal monitoring and research objectives of the NOAA Ocean Acidification Program (OAP).

Table 3Metadata elements available for each observed property in the generic variable metadata section. For the latest version of the metadata template, refer to “NA” is short for “not available”.

Download XLSX

One of the most important elements of the above metadata template is the “variable metadata section” (Jiang et al., 2015). It enables all ancillary information of a variable to be organized around the observed property (Table 3). Note that here “variables” refer to observed oceanographic properties (e.g., temperature, salinity, dissolved oxygen, pH, nitrate). They should not be confused with other supporting variables such as EXPOCODE, Cruise_ID, year, month, day, yearday, longitude, latitude, depth, flags, etc. The latter elements are important for understanding the dataset, but the “variable metadata section” as described here is not applicable to them. Note that Table 3 shows the available metadata elements for a generic oceanographic variable. Customized variable metadata sections for ocean carbon variables (DIC, TA, fCO2, and pH) allow additional information to be documented. Refer to the metadata template file for more details about these metadata elements (, last access: 5 November 2023).

Within the “variable metadata section”, the metadata element of “in situ observation/manipulation condition/response variable” in Jiang et al. (2015) was replaced with “in situ or manipulated”. This change simplified this term, without compromising the purpose of differentiating whether a term is an in situ observed variable or a manipulated variable. New elements such as “discrete or continuous”, “manipulation method”, “calculation method and parameters”, “sampling method”, and “analyzing method”, “calibration info”, “QC steps taken”, and “weather or climate quality” were also added. Refer to Table 3 for their detailed descriptions. Metadata elements that were rarely used, such as “purpose”, “sections (cruise legs)”, “duration (for experiment/settlement/colonization methods)”, and “spatial reference system”, were eliminated.

4 Controlled vocabularies

For OAE data management, metadata elements that should be supported with controlled vocabularies include observed properties (e.g., DIC, TA, dissolved oxygen), observation or study types (e.g., surface underway, time series), platforms (e.g., research vessels), sea names, instruments, people, institutions, countries, etc. For platforms, refer to the International Council for the Exploration of the Sea (ICES): (last access: 5 November 2023). For sea names, it is recommended to use the SeaDataNet C16 list (sea areas): (last access: 5 November 2023). For countries, use the SeaDataNet C32 list (International Standards Organisation Countries): (last access: 5 November 2023). For investigator names, it is recommended to use the list as managed by ORCID: (last access: 5 November 2023). For institutions, refer to the Research Organization Registry (ROR): (last access: 5 November 2023). Another two groups of controlled vocabularies related to OAE studies are presented here: (a) types of OAE studies (Table 4) and (b) types of source materials for OAE (Table 5).

Table 4Controlled vocabularies for major types of OAE studies. NVS is short for NERC Vocabulary Server (NVS) (link:, last access: 5 November 2023). SDN is short for SeaDataNet. “NA” is short for “not available”. Refer to Table 1 for more information about some of these study types. For the latest version of this list, refer to (last access: 5 November 2023).

Download Print Version | Download XLSX

Table 5Controlled vocabularies for source materials for OAE (based on Renforth and Henderson, 2017; Caserini et al., 2022). See also Eisaman et al. (2023, this Guide). For the latest version of this list, refer to (last access: 5 November 2023).

Download Print Version | Download XLSX

Table 6Variables related to total dissolved inorganic carbon (DIC) content within the Climate and Forecast (CF) conventions (, last access: 5 November 2023).

Download Print Version | Download XLSX

Controlled vocabularies play a crucial role in data management, enabling researchers to describe their data in a standardized and precise way. Among the various types of controlled vocabularies, observed properties are particularly important, as they describe the measurable characteristics of a survey or experiment. However, observed properties also pose some challenges, as the terms used to describe them can be highly specialized and context-dependent. For example, different prefixes and postfixes may be added to the same basic term, resulting in a proliferation of narrow and highly specific terms (see examples in Table 6). This can make it difficult to find the right term for a given purpose and can also lead to inconsistencies and confusion. Furthermore, different communities may use slightly different terms to describe the same property or may have different conventions for expressing units and dimensions.

The current setup makes it necessary to create multiple variations of the same property, defeating the purpose of controlled vocabularies. Moving forward, it is important to develop clear guidelines and standards to foster collaboration and communication among different communities. Specifically, it is recommended to manage controlled vocabularies for different types of information separately. Imagine the CF convention only has one clean term called “dissolved inorganic carbon”, with a preferred unit of “µmol/kg”. The list will be significantly shorter, and each of the terms will be much broadly used. It would also be much more cost-effective to manage a shorter list. Ideally, such vocabulary development efforts should be driven by the scientific community to ensure their accuracy, and the developed list will conform to the needs and preferences of their research. Before those clean lists are developed, it is recommended to use the list as documented in Table 1 of Jiang et al. (2015) for the purpose of standardizing observed properties.

Table 7Controlled vocabularies for platform types. NVS is short for NERC Vocabulary Server (NVS) (link: SDN is short for SeaDataNet. For the latest version of this list, refer to (last access: 5 November 2023).

Download Print Version | Download XLSX

Table 8Controlled vocabularies for instrument types. NVS is short for NERC Vocabulary Server (NVS) (link: SDN is short for SeaDataNet. “NA” is short for “not available”. For the latest version of this list, refer to (last access: 5 November 2023).

Download XLSX

Additionally, two new types of controlled vocabularies were introduced. In the metadata template described by Jiang et al. (2015), a metadata section called platform is used to document the platform information. This section contains information such as platform name, ID, type, owner, and country. Of these elements, the platform type could play an important role when it comes to data search purposes. SeaDataNet manages a similar list called “seavox platform categories” (L06) for this purpose. However, it does not cover all the terms the OAE research needs. In this paper, we introduce a new list for this purpose (Table 7). Similarly, SeaDataNet has a list called “device categories” (L05) for the types of instruments, although it does not have all the needed terms for OAE research. Table 8 lists instruments that are most likely used in this field.

5 Data citation

For oceanographic research, data citation commonly includes information such as a list of ordered authors, publication year, title, version, repository, and persistent identifier (e.g., DOI or URL) for the dataset. Here is an example of a good data citation: Feely, Richard A.; Alin, Simone R.; Hales, Burke; Johnson, Gregory C.; Juranek, Laurie W.; Byrne, Robert H.; Peterson, William T.; Goni, Miguel; Liu, Xuewu; Greeley, Dana (2015). Dissolved inorganic carbon, total alkalinity, pH, temperature, salinity and other variables collected from profile and discrete sample observations using CTD, Niskin bottle, and other instruments from R/V Wecoma in the U.S. West Coast California Current System during the 2011 West Coast Ocean Acidification Cruise (WCOA2011) from 2011-08-12 to 2011-08-30 (NCEI Accession 0123467). Version 3.3. NOAA National Centers for Environmental Information. Dataset. Accessed on 2023-03-15.

There are three important considerations when it comes to minting DOIs for datasets. Firstly, it is advisable to avoid using different DOIs for different versions of the same dataset. Instead, it is recommended to mint one DOI that covers all versions of the dataset. This approach ensures that users with a DOI can always access the latest version of the dataset, as well as any historical versions. To differentiate between versions, the citation for the dataset should include its version information. Secondly, it is crucial to wait until the dataset is published in a long-term archive with a stable link before minting a DOI. A DOI is only as reliable as the link it resolves to, so it is essential to ensure that the link is stable and will not change in the future. If the link changes later on, the DOI will become broken. Thirdly, it is important to ensure that only one DOI is assigned to a dataset in the data flow. It is not uncommon for a dataset to be submitted to a data assembly center and be forwarded to another data assembly center for different purposes later on. To avoid the risk of confusing users with multiple versions of the same dataset in different places, it is essential to make sure that only one DOI is minted for the authoritative version of the dataset. According to the NOAA plan to increase Public Access to Research Results (PARR), only NOAA National Data Centers are authorized to mint DOIs for NOAA-funded datasets (NOAA, 2015).

6 Data repositories

Ideally, scientists should only need to submit their data once, and all distributed data assembly centers act as regional nodes, thereby contributing to the availability of ocean carbon and acidification data through a centralized data portal. Achieving this goal requires the provision of standardized metadata to the search engine of the agreed-upon one-stop portal. The most recent data management initiative by the UN Ocean Acidification Research for Sustainability (OARS) recommends the use of the GOA-ON Portal as the envisioned one-stop OA data portal. Once implemented, users can use the GOA-ON Portal to search for and access all ocean carbon and acidification data of a specific type. Upon discovering a dataset through the portal, the user can then return to the respective regional data assembly center to access the data files and locate pertinent metadata information. In order for the abovementioned federated system to work, each data assembly center would need to meet the following standards:

  1. A long-term archive ensures uninterrupted data access into the future.

  2. Strict version control capabilities preserve all historical versions of a dataset on a permanent basis.

  3. An online submission interface enables users to prepare metadata in a machine-readable format and to upload data files. Ideally, it should incorporate a user profile management interface, enabling users to keep track of all historical submissions and resume a submission at a later time.

  4. A community-driven common metadata template supports the management of comprehensive metadata information needed for ocean alkalinity enhancement research.

  5. Metadata are stored in the following:

    • a.

      a user-friendly interface for metadata readability (e.g., HTML);

    • b.

      a machine-readable format to facilitate machine-to-machine interoperability (e.g., XML, SQL);

  6. controlled vocabularies utilized to various aspects of the metadata to ensure easy machine-to-machine metadata exchange and successful data findability;

  7. data citation with permanent digital object identifiers (DOIs);

  8. an existing mechanism to share standardized metadata with the search engine of the agreed upon data portal.

Before such a system is established, it is recommended to share a copy of the data with the Ocean Carbon and Acidification Data System (OCADS) at NOAA's National Centers for Environmental Information (NCEI) or other qualified data assembly centers to ensure timely inclusion into data products, e.g., the Surface Ocean CO2 Atlas (SOCAT) and Global Ocean Data Analysis Product Version 2 (GLODAPv2). OCADS manages a wide range of ocean carbon and acidification data, including chemical, physical, and biological observations collected from research vessels, ships of opportunity, and uncrewed platforms, as well as laboratory experiment results, and model outputs (Jiang et al., 2023). It has an established setup to channel incoming datasets to existing data products, such as SOCAT (Bakker et al., 2016) and GLODAPv2 (Lauvset et al., 2022). OCADS welcomes submissions from scientists and institutions around the world. Follow this link to access the home page of OCADS: (last access: 5 November 2023). Genetics or eDNA raw data are an exception and should be sent to the National Center for Biotechnology Information (NCBI).

In Europe, in situ OA data are typically submitted to National Oceanographic Data Centres (NODCs) along with other types of measurements. Some research groups may also submit their OA data to specialized data assembly centers like SOCAT or publish their experimental data through data publishers like Pangaea. Government monitoring agencies in northern Europe typically send their OA data to the International Council for the Exploration of the Sea (ICES). Data centers may then integrate these data with other measurements in their databases using controlled vocabularies and standardized metadata elements. Since the late 1990s, data centers and associated organizations involved in marine data collection, management, and curation in European countries have collaborated as part of SeaDataNet, SeaDataNet 2, and SeaDataCloud. These projects have developed and adopted common standards for vocabularies, metadata schemas, data formats, and quality control procedures, enabling harmonization and interoperability of diverse marine data across Europe. The SeaDataNet infrastructure and common standards are critical to the operation and strengthening of key data workflows that feed into the European Marine Observation and Data Network (EMODnet), created to support the EU's integrated maritime policies. EMODnet Chemistry generates data products and provides centralized access to data relevant to the implementation of European Union maritime policies, with OA data being one of the four main focuses alongside eutrophication, contaminants, and marine litter. However, the workflow for OA data in Europe is not yet well-established, and there is an opportunity to build a harmonized workflow from data creators to data centers to data aggregators and product creators. Collaboration between data curators, IT and semantic specialists, and scientists can help enrich the semantic annotation of OA datasets with essential metadata information, which is needed to support OA research and monitoring efforts.

7 Conclusions

This paper offers comprehensive guidelines for OAE researchers to prepare their metadata and data for submission to long-term archives. These guidelines encompass a wide range of OAE data types, including discrete bottled measurements and autonomous measurements from surface underway and uncrewed platforms such as moorings, Saildrones, gliders, and Argo floats. Furthermore, they address physiological response studies conducted in various settings, such as laboratory experiments, mesocosms, field experiments, and natural analogues. The paper also provides a universal metadata template and data standards tailored to each type of OAE data. Additionally, it presents controlled vocabularies for observation types, alkalinization methods, platform types, and instruments. These guidelines are also applicable to ocean acidification data.

Key recommendations for data reporting

  • Gather metadata elements using the most recent version of the OAE-compatible metadata template (Tables 2 and 3): (last access: 5 November 2023).

  • Wherever feasible, utilize the suggested OAE-compatible controlled vocabularies for metadata fields (Tables 4, 5, 7, and 8).

  • Prepare data files in accordance with the specific data standard designated for the relevant OAE research type (Table 1).

Data availability

No data sets were used in this article.

Author contributions

This paper was a collaborative effort with contributions from all authors. LQJ​​​​​​​ prepared the initial draft, KF crafted the modeling section, and the remaining sections were assembled collectively. AVS and JPG critically examined the chemical and physical dimensions of the guidelines, while DB focused on reviewing the biological aspect.

Competing interests

Competing interests are declared in a summary for the entire volume at:


Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.


We thank the Ocean Acidification and other ocean Changes – Impacts and Solutions (OACIS), an initiative of the Prince Albert II of Monaco Foundation, for its support throughout the project. We extend our gratitude to the Villefranche Oceanographic Laboratory for supporting the meeting of the lead authors in January 2023. We are grateful to Gwenaelle Moncoiffé of British Oceanographic Data Centre (Liverpool, United Kingdom) for her tremendous contributions to writing the sections on controlled vocabularies, and European data management activities. We thank Patrick Hogan and Scott Cross of NOAA National Centers for Environmental Information for providing excellent comments during the internal review process. We also thank Liem Nguyen and Atharva Bhalke of University of Maryland for helping create the corresponding web pages for Tables 4, 5, 7, and 8.

Funding for Li-Qing Jiang was from NOAA Ocean Acidification Program (OAP, Project ID: 21047) and NOAA National Centers for Environmental Information (NCEI) through a NOAA Cooperative Institute for Satellite Earth System Studies (CISESS) grant (NA19NES4320002) at the Earth System Science Interdisciplinary Center (ESSIC), University of Maryland. Adam V. Subhas acknowledges support from ClimateWorks for natural analogs to ocean alkalinity enhancement, and from the Locking away Ocean Carbon in the Northeast Shelf and Slope (LOC-NESS) project funded by the Carbon to Sea Initiative. Daniela Basso was funded by the Ocean Acidification and other ocean Changes – Impacts and Solutions (OACIS) Initiative (Project grant number 4000.1-1) of the Prince Albert II of Monaco Foundation. Katja Fennel acknowledges funding from the Natural Sciences and Engineering Research Council of Canada (NSERC)'s Discovery and Alliance Programs, and the Ocean Alk-Align project funded by Carbon to Sea.

Financial support

This research has been supported by the ClimateWorks Foundation (grant no. 22-0296) and the Prince Albert II of Monaco Foundation. It has also been supported by the NOAA Ocean Acidification Program (OAP, Project ID: 21047); NOAA National Centers for Environmental Information (NCEI) through a NOAA Cooperative Institute for Satellite Earth System Studies (CISESS) grant (grant no. NA19NES4320002) at the Earth System Science Interdisciplinary Center (ESSIC), University of Maryland; ClimateWorks for natural analogs to ocean alkalinity enhancement; the Locking away Ocean Carbon in the Northeast Shelf and Slope (LOC-NESS) project funded by the Carbon to Sea Initiative; the Ocean Acidification and other ocean Changes – Impacts and Solutions (OACIS) Initiative (grant no. 4000.1-1) of the Prince Albert II of Monaco Foundation; the Natural Sciences and Engineering Research Council of Canada (NSERC)'s Discovery and Alliance Programs; and the Ocean Alk-Align project funded by Carbon to Sea.

Review statement

This paper was edited by Terre Satterfield and reviewed by Lennart Bach and one anonymous referee.


Bakker, D. C. E., Pfeil, B., Landa, C. S., Metzl, N., O'Brien, K. M., Olsen, A., Smith, K., Cosca, C., Harasawa, S., Jones, S. D., Nakaoka, S., Nojiri, Y., Schuster, U., Steinhoff, T., Sweeney, C., Takahashi, T., Tilbrook, B., Wada, C., Wanninkhof, R., Alin, S. R., Balestrini, C. F., Barbero, L., Bates, N. R., Bianchi, A. A., Bonou, F., Boutin, J., Bozec, Y., Burger, E. F., Cai, W.-J., Castle, R. D., Chen, L., Chierici, M., Currie, K., Evans, W., Featherstone, C., Feely, R. A., Fransson, A., Goyet, C., Greenwood, N., Gregor, L., Hankin, S., Hardman-Mountford, N. J., Harlay, J., Hauck, J., Hoppema, M., Humphreys, M. P., Hunt, C. W., Huss, B., Ibánhez, J. S. P., Johannessen, T., Keeling, R., Kitidis, V., Körtzinger, A., Kozyr, A., Krasakopoulou, E., Kuwata, A., Landschützer, P., Lauvset, S. K., Lefèvre, N., Lo Monaco, C., Manke, A., Mathis, J. T., Merlivat, L., Millero, F. J., Monteiro, P. M. S., Munro, D. R., Murata, A., Newberger, T., Omar, A. M., Ono, T., Paterson, K., Pearce, D., Pierrot, D., Robbins, L. L., Saito, S., Salisbury, J., Schlitzer, R., Schneider, B., Schweitzer, R., Sieger, R., Skjelvan, I., Sullivan, K. F., Sutherland, S. C., Sutton, A. J., Tadokoro, K., Telszewski, M., Tuma, M., van Heuven, S. M. A. C., Vandemark, D., Ward, B., Watson, A. J., and Xu, S.: A multi-decade record of high-quality fCO2 data in version 3 of the Surface Ocean CO2 Atlas (SOCAT), Earth Syst. Sci. Data, 8, 383–413,, 2016. 

Balaji, V., Taylor, K. E., Juckes, M., Lawrence, B. N., Durack, P. J., Lautenschlager, M., Blanton, C., Cinquini, L., Denvil, S., Elkington, M., Guglielmo, F., Guilyardi, E., Hassell, D., Kharin, S., Kindermann, S., Nikonov, S., Radhakrishnan, A., Stockhause, M., Weigel, T., and Williams, D.: Requirements for a global data infrastructure in support of CMIP6, Geosci. Model Dev., 11, 3659–3680,, 2018. 

Berman, F. and Fox, G.: The role of data standards in science and society, Commun. ACM, 56, 38–44,, 2013. 

Brett, A., Leape, J., Abbott, M., Sakaguchi, H., Cao, L., Chand, K., Golbuu, Y., Martin, T. J., Mayorga, J., and Myksvoll, M. S.: Ocean Data Need a sea change to help navigate the Warming World, Nature, 582, 181–183,, 2020. 

Caserini, S., Storni, N., and Grosso, M.: The availability of limestone and other raw materials for ocean alkalinity enhancement, Global Biogeochem. Cy., 36, e2021GB007246,, 2022.​​​​​​​ 

Cyronak, T., Albright, R., and Bach, L. T.: Field experiments in ocean alkalinity enhancement research, in: Guide to Best Practices in Ocean Alkalinity Enhancement Research, edited by: Oschlies, A., Stevenson, A., Bach, L. T., Fennel, K., Rickaby, R. E. M., Satterfield, T., Webb, R., and Gattuso, J.-P., Copernicus Publications, State Planet, 2-oae2023, 7,, 2023. 

de La Beaujardière, J., Beegle-Krause, C. J., Bermudez, L., Hankin, S., Hazard, L., Howlett, E., Le, S., Proctor, R., Signell, R. P., Snowden, D., and Thomas, J.: Ocean and Coastal Data Management, Proceedings of OceanObs'09: Sustained Ocean Observations and Information for Society, 21–25 December 2009, Venice, Italy,, 2010.​​​​​​​ 

Eisaman, M. D., Geilert, S., Renforth, P., Bastianini, L., Campbell, J., Dale, A. W., Foteinis, S., Grasse, P., Hawrot, O., Löscher, C. R., Rau, G. H., and Rønning, J.: Assessing the technical aspects of ocean-alkalinity-enhancement approaches, in: Guide to Best Practices in Ocean Alkalinity Enhancement Research, edited by: Oschlies, A., Stevenson, A., Bach, L. T., Fennel, K., Rickaby, R. E. M., Satterfield, T., Webb, R., and Gattuso, J.-P., Copernicus Publications, State Planet, 2-oae2023, 3,, 2023. 

Fennel, K., Long, M. C., Algar, C., Carter, B., Keller, D., Laurent, A., Mattern, J. P., Musgrave, R., Oschlies, A., Ostiguy, J., Palter, J. B., and Whitt, D. B.: Modelling considerations for research on ocean alkalinity enhancement (OAE), in: Guide to Best Practices in Ocean Alkalinity Enhancement Research, edited by: Oschlies, A., Stevenson, A., Bach, L. T., Fennel, K., Rickaby, R. E. M., Satterfield, T., Webb, R., and Gattuso, J.-P., Copernicus Publications, State Planet, 2-oae2023, 9,, 2023. 

Friederich, G. E., Brewer, P. G., Herlien, R., and Chavez, F. P.: Measurement of sea surface partial pressure of CO2 from a moored buoy, Deep-Sea Res. Pt. I, 42, 1175–1186., 1995. 

Guenther, R. and Radebaugh, J.: Understanding Metadata, National Information Standards Organization (NISO) Press, Bethesda, Maryland, USA, ISBN 1-880124-62-9, 2004. 

Hankin, S. C., Blower, J. D., Carval, T., Casey, K. S., Donlon, C., Lauret, O., Loubrieu, T., Srinivasan, A., Trinanes, J. Godøy, Ø. Mendelssohn, R., Signell, R. P., de La Beaujardiere, J., Cornillon, P., Blanc, F., Rew, R., and Harlan, J.: NetCDF-CF-OPeNDAP: Standards for Ocean Data Interoperability and Object Lessons for Community Data Standards Processes, Proceedings of OceanObs'09: Sustained Ocean Observations and Information for Society, 21–25 September 2009, Venice, Italy,, 2010. 

Hassell, D., Gregory, J., Blower, J., Lawrence, B. N., and Taylor, K. E.: A data model of the Climate and Forecast metadata conventions (CF-1.6) with a software implementation (cf-python v2.1), Geosci. Model Dev., 10, 4619–4646,, 2017. 

Iglesias-Rodríguez, M. D., Rickaby, R. E. M., Singh, A., and Gately, J. A.: Laboratory experiments in ocean alkalinity enhancement research, in: Guide to Best Practices in Ocean Alkalinity Enhancement Research, edited by: Oschlies, A., Stevenson, A., Bach, L. T., Fennel, K., Rickaby, R. E. M., Satterfield, T., Webb, R., and Gattuso, J.-P., Copernicus Publications, State Planet, 2-oae2023, 5,, 2023. 

Jiang, L.-Q., O'Connor, S. A., Arzayus, K. M., and Parsons, A. R.: A metadata template for ocean acidification data, Earth Syst. Sci. Data, 7, 117–125,, 2015. 

Jiang L.-Q., Pierrot, D., Wanninkhof, R., Feely, R. A., Tilbrook, B., Alin, S., Barbero, L., Byrne, R. H., Carter, B. R., Dickson, A. G., Gattuso, J.-P., Greeley, D., Hoppema, M., Humphreys, M. P., Karstensen, J., Lange, N., Lauvset, S. K., Lewis, E. R., Olsen, A., Perez, F. F., Sabine, C., Sharp, J. D., Tanhua, T., Trull, T. W., Velo, A., Allegra, A. J., Barker, P., Burger, E., Cai, W.-J., Chen, C.-T. A., Cross, J., Garcia, H., Hernandez-Ayon, J. M., Hu, X., Kozyr, A., Langdon, C., Lee, K., Salisbury, J., Wang, Z. A., and Xue, L.: Best practice data standards for discrete chemical oceanographic observations, Frontiers in Marine Science, 8, 705638,, 2022.​​​​​​​ 

Jiang, L.-Q., Kozyr, A., Relph, J., Ronje, E., Kamb, L., Burger, E., Myer, J., Nguyen, L., Arzayus, K. M., Boyer, T., Cross, S., Garcia, H., Hogan, P., Larsen, K., and Parsons, A. R.: The ocean carbon and acidification data system, Nature-Scientific Data, 10, 136,, 2023.​​​​​​​ 

Lauvset, S. K., Lange, N., Tanhua, T., Bittig, H. C., Olsen, A., Kozyr, A., Alin, S., Álvarez, M., Azetsu-Scott, K., Barbero, L., Becker, S., Brown, P. J., Carter, B. R., da Cunha, L. C., Feely, R. A., Hoppema, M., Humphreys, M. P., Ishii, M., Jeansson, E., Jiang, L.-Q., Jones, S. D., Lo Monaco, C., Murata, A., Müller, J. D., Pérez, F. F., Pfeil, B., Schirnick, C., Steinfeldt, R., Suzuki, T., Tilbrook, B., Ulfsbo, A., Velo, A., Woosley, R. J., and Key, R. M.: GLODAPv2.2022: the latest version of the global interior ocean biogeochemical data product, Earth Syst. Sci. Data, 14, 5543–5572,, 2022. 

Lueker, T. J., Dickson, A. G., and Keeling, C. D.: Ocean pCO2 calculated from dissolved inorganic carbon, alkalinity, and equations for K1 and K2: validation based on laboratory measurements of CO2 in gas and seawater at equilibrium, Mar. Chem., 70, 105–119, 2000. 

National Oceanic and Atmospheric Administration (NOAA): NOAA plan for increasing public access to research results: a response to the White House Office of Science and Technology Policy memorandum “Increasing access to the results of Federal funded scientific research” issued February 22, 2013, National Oceanic and Atmospheric Administration, Washington, D.C.,, 2015. 

Newton J. A., Feely R. A., Jewett E. B., Williamson P., and Mathis J.: Global Ocean Acidification Observing Network: Requirements and Governance Plan, end edn., Global Ocean Acidification Observing Network (GOA-ON), (last access: 5 November 2023), 2015. 

Pardis, W., Grabb, K. C., DeGrandpre, M. D., Spaulding, R., Beck, J., Pfeifer, J. A., and Long, D. M.: Measuring protons with photons: A hand-held, spectrophotometric ph analyzer for ocean acidification research, Community Science and Education, Sensors, 22, 7924,, 2022. 

Renforth, P. and Henderson, G.: Assessing ocean alkalinity for carbon sequestration, Rev. Geophys., 55, 636–674,, 2017. 

Riebesell, U., Basso, D., Geilert, S., Dale, A. W., and Kreuzburg, M.: Mesocosm experiments in ocean alkalinity enhancement research, in: Guide to Best Practices in Ocean Alkalinity Enhancement Research, edited by: Oschlies, A., Stevenson, A., Bach, L. T., Fennel, K., Rickaby, R. E. M., Satterfield, T., Webb, R., and Gattuso, J.-P., Copernicus Publications, State Planet, 2-oae2023, 6,, 2023. 

Subhas, A. V., Lehmann, N., and Rickaby, R. E. M.: Natural analogs to ocean alkalinity enhancement, in: Guide to Best Practices in Ocean Alkalinity Enhancement Research, edited by: Oschlies, A., Stevenson, A., Bach, L. T., Fennel, K., Rickaby, R. E. M., Satterfield, T., Webb, R., and Gattuso, J.-P., Copernicus Publications, State Planet, 2-oae2023, 8,, 2023. 

Tanhua, T., McCurdy, A., Fischer, A., Appeltans, W., Bax, N., Currie, K., DeYoung, B., Dunn, D., Heslop, E., Glover, L. K., Gunn, J., Hill, K., Ishii, M., Legler, D., Lindstrom, E., Miloslavich, P., Moltmann, T., Nolan, G., Palacz, A., Simmons, S., Sloyan, B., Smith, L. M., Smith, N., Telszewski, M., Visbeck, M., and Wilkin, J.: What we have learned from the framework for Ocean Observing: Evolution of the global ocean observing system, Frontiers in Marine Science, 6, 471,, 2019.​​​​​​​ 

TGDCSP (Task Group on Data Citation Standards and Practices, CODATA-ICSTI): Out of cite, out of mind: The current state of practice, policy, and technology for the citation of data, Data Science Journal, 12, CIDCR1–CIDCR75,, 2013.  

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzalez-Beltran, A., Gray, A. J. G., Groth, P., Goble, C., Grethe, J. S., Heringa, J., Hoen, P. A. C., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, Sansone, R. S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., Thompson, M.,van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J., and Mons, B.: The Fair Guiding Principles for Scientific Data Management and Stewardship, Scientific Data, 3, 160018,, 2016. 

Zeng, M. L. and Qin, J.: Metadata, Chap. 4, Controlled Vocabularies, Neal-Schuman Publishers, New York, NY, 61–77, ISBN 9781555706357,, 2008. 

Short summary
This paper provides comprehensive guidelines for ocean alkalinity enhancement (OAE) researchers on archiving their metadata and data. It includes data standards for various OAE studies and a universal metadata template. Controlled vocabularies for terms like alkalinization methods are included. These guidelines also apply to ocean acidification data.
Final-revised paper