Just as researchers routinely provide a bibliographic reference to sources such as journal articles, reports and conference papers, data citation is the practice of providing reference to datasets.
Why cite data?
Researchers should cite data in just the same way that you can cite other sources of information, such as articles and books.
Data citation can help by:
- Enabling easy reuse and verification of data;
- Allowing the impact of data to be tracked; and
- Creating a scholarly structure that recognises and rewards data producers.
Is data citation the same as citing published papers?
While there are established conventions for citing published papers, the accepted forms and content of data citations are not always as clear, especially when the data are published online. Conventions are expected to solidify as more and more data become available online.
In terms of where a citation of data should be put there are two main places: within the text of the article and in the reference list. Within the text of the article, the citation should provide sufficient information to identify the data citation in the reference list.
There is no consensus on format and components for citations of electronic data. Emerging conventions vary by discipline, but there are common elements within these conventions.
Common elements of data citation
Lancaster University recommends using a recognised citation approach such as that of DataCite. The suggested format of DataCite is:
- Creator(s) (The main researchers involved in producing the data, or the authors of the publication, in priority order)
- Publication Year (the date when the dataset was published or released rather than the collection or coverage date)
- Title (including the edition or version number, if applicable)
- Publisher (the Data Centre or University/Institute that holds, archives, publishes, prints, distributes, releases, issues, or produces the resource)
- Identifier (For citation purposes DataCite recommends using a Digital Object Identifier (DOI), a linkable, permanent URL)
Data citation examples
- Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127‐797. Geological Institute, University of Tokyo. http://dx.doi.org/10.1594/PANGAEA.726855
- Geofon operator (2009): GEFON event gfz2009kciu (NW Balkan Region). GeoForschungsZentrum Potsdam (GFZ). http://dx.doi.org/10.1594/GFZ.GEOFON.gfz2009kciu
- Denhard, Michael (2009): dphase_mpeps: MicroPEPS LAF‐Ensemble run by DWD for the MAP D‐PHASE project. World Data Center for Climate. http://dx.doi.org/10.1594/WDCC/dphase_mpeps
- Haardt, H; Maaßen, R (1983): Physical oceanography from the Drake Passage and Bransfield Strait during Meteor cruise M56. Institut für Angewandte Physik, Christian-Albrechts-Universität, Kiel, http://dx.doi.org/10.1594/PANGAEA.73766
Examples taken from DataCite.
Citation formatting service
Use the DOI Citation Formatter, a service created in collaboration with CrossRef, to format your citation. This will ensure you adopt the correct format for your needs.
Challenges of data citation
- Valuable datasets are often those which are long term, and still being updated. How is it possible to cite a dataset that is still being changed and added to (a "dynamic dataset")?
- Datasets can be enormous and may have hundreds or thousands of collaborators. If you only want to cite a smaller subsection (a "microcitation"), it is often difficult to identify the authors who need credit.
Such challenges are currently being discussed by interested parties and stakeholders such the Research Data Alliance’s Data Citation Working Group or the CODATA Task Group on Data Citation.