Documentation and Metadata
What are documentation and metadata, and why should I consider creating them?
Why is it important?
Good documentation makes material understandable, verifiable, and reusable. Just making data available to others does not make it usable or useful. If you or someone else comes back to your data at a later time, they will need this documentation to understand when, why, and by whom the data was created, what methods were used, and explanation of acronyms, or jargon.
Creating good metadata is part of good practice in Research Data Management.
What do Funding Bodies expect?
Research funder requirements now demand researchers to create and make documentation and metadata openly available, thereby facilitating access and re-use to often complex datasets.
"Research data should be accompanied by high-quality documentation and metadata to provide secondary users with essential information to independently understand the data, enable discovery, and allow for scientific re-use. Documentation should describe at least the origin of data, fieldwork and data collection methods, processing and/or the researcher’s management of the data. Individual data items such as variables or transcripts should be clearly labelled and described." (ESRC Research Data Policy (pdf), Principle 3 Implementation Note, March 2015)
Please note that Lancaster University also expects researchers to create metadata that is "sufficient to enable other researchers to understand how it was created or acquired, and, if it is to be made openly available, to discover it and assess its reuse potential." (Lancaster University Research Data Management Policy, Word).
What does documentation include?
Supporting documentation can include information on:
- What hardware and software were used to create the data?
- What methodologies were used to create the data?
- What assumptions were made in your experiments?
- Why are there anomalies in your data?
When should I write research project documentation?
Much of what you need in project level documentation will have already been included in the project application. Documentation content, such as the aims and objectives of the project, any hypotheses and the methodologies used in the project, can be created even before the project has begun and so should not be very time consuming.
What is metadata?
Metadata is often defined as data about data.
It is related to the broader contextual information that describes your data, but is usually more structured in that it conforms to set standards and is machine readable. One typical use of metadata is to create a catalogue record for a dataset held in an archive. Bibliographical metadata about a journal article is another example of metadata.
Metadata commonly includes information such as:
- Who created the data;
- Who published the data;
- An abstract of the data; and
- A description of the data.
What is "minimal" or "mandatory" metadata?
These are basic elements that most researchers consider "must-haves" for data documentation. Most metadata standards have formalised a set of mandatory metadata fields without which the metadata record is not valid. These often include Title, Creator, Description, Rights or Date.
Lancaster University recommends using a set of recommended minimal metadata fields to describe data sets:
|Title||A name or title by which a resource is known|
|Unique resource identifier||For your working data, this could be a project ID or a departmental identifier. Once you publish your data the unique resource identifier will be a DOI (Digital Object Identifier)|
|Description||Description of the data set, like an abstract for a paper|
|Subject||Subject or classification code describing the resource chosen from one or more authoritative sources|
|Creator(s)||The main researchers involved in producing the data in priority order|
|Funder||Sources of financial support for the development of the resource, eg ESRC or Wellcome Trust|
|Resource Language||Default will be set to 'eng' (English)|
|Publication date||The date when the data was or will be made publicly available|
|Publisher||The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. For your working data this will be Lancaster University.|
|Contact email address||Person or service with knowledge of how to access, troubleshoot, or otherwise field issues related to the data set|
You will need these metadata when you submit your data information into Pure or when you deposit your data set into a data centre as required by your funding body, so it will save you time if you have them ready early in your project.
Many research domains, research data repositories, and funding agencies have specific requirements for metadata and data documentation. Please contact Lancaster RDM support if you have questions regarding your data's metadata.
Good metadata (often called "rich" metadata) will provide a relevant context for research data, help track its provenance, and in the longer term, make it easier to find and use research data, and for others to discover it. The above "minimal" metadata fields can be enhanced by including more relevant information such as:
- Version number;
- Collection dates;
- Geographic coverage;
- Item embargo;
- Item MIME type;
- Related resources; and
- Access restrictions.