The Data Documentation Initiative (DDI) is a suite of products that describes metadata about both quantitative and qualitative research data in the social, behavioral, economic, and health sciences. The DDI suite is a set of free standards that document and manage different stages of the research data lifecycle, including conceptualization, collection, process, distribution, discovery, and archiving.
The content areas of DDI cover the following areas:
- Conceptual objects: concept, unit, unit type, universe, population, geographic structures, and representation
- Methodological objects: approaches to sample selection, data capture, weighting, quality control, and process management
- Processing: data capture, data processing, analysis, and data management
- Quantitative and qualitative data objects: concept, universe, representation, usage, data type, record, record relationships, storage, access, and descriptive statistics
- Data management: ownership, access, rights management, restrictions, quality standards, organization, agent management, relationship between products, versioning, and provenance
Products within the DDI suite differ in terms of their area of coverage within DDI, supported activities, and required level of infrastructure. From simple descriptive content for human understanding to structures that support metadata-driven statistics production and analysis, DDI addresses a broad area of data management needs. As a suite of standards, DDI provides a common means of identification for information objects, support for common cross-product content, and an informed means of transforming content between products.
Current DDI Products
[DDI also has several products under development. Descriptions of those products are found here.]
DDI Codebook - Structured, descriptive documentation of the content, meaning, provenance, and access for a single data set.
DDI Lifecycle - Lifecycle expands on the idea of Codebook in terms of content coverage, depth, metadata management over time, reusable metadata, and support for the planning, capture, processing, storage, discovery and dissemination of data. It allows grouping and comparing related studies or series of studies.
Controlled Vocabularies - A set of controlled vocabularies commonly used in social science and other disciplines to support systems designed to identify, locate, and access data for research purposes.
XKOS - Extended Knowledge Organization System (XKOS) leverages the Simple Knowledge Organization System (SKOS) for managing statistical classifications and concept management systems. XKOS adds the extensions that are needed to meet the requirements of the statistical community.
SDTL: Structured Data Transformation Language (SDTL) is an independent intermediate language for representing data transformation commands
Product | Description | Supports Actvities | Points of Contact with other DDI Products | Available metadata syntax representations |
---|---|---|---|---|
Codebook | Originally developed as an XML DTD, Codebook retains the hierarchical structure of a DTD in describing the contents of a descriptive codebook for a data set including: identification, authorship, ownership, purpose, background methodologies, source information, provenance, quality control, access, physical file structures, variables/variable groupings, and related materials. Extensive information is found within the variable description covering the data source, derivation activity, representation, data typing, variable role, and restrictions. Content Coverage Codebook covers all major content areas but in general, is limited to descriptive narrative |
|
|
|
Lifecycle | Lifecycle expands on the idea of Codebook in terms of content coverage, depth, metadata management over time, reusable metadata, and support for the planning, capture, processing, storage, discovery and dissemination of research data. Lifecycle is the most comprehensive of the DDI products covering conceptual and methodological objects, processing, quantitative and qualitative data objects, and data management. Lifecycle is appropriate for longitudinal, linked, and other complex datasets. |
|
|
|
Controlled Vocabularies |
A set of controlled vocabularies commonly used in social science research. Reflects uses of controlled vocabulary to support systems designed to identify, locate, and access data for research purposes. Content coverage is driven by the needs of the DDI community, but use is not limited to this community. |
|
|
|
XKOS | XKOS extends Simple Knowledge Organization System (SKOS) for the needs of statistical classifications. It does so in two main directions. First, it defines a number of terms that enable the representation of statistical classifications with their structure and textual properties, as well as the relations between classifications. Second, it refines SKOS semantic properties to allow the use of more specific relations between concepts. Those specific relations can be used for the representation of classifications or for any other case where SKOS is employed. XKOS adds the extensions that are desirable to meet the requirements of the statistical community. |
|
|
|
SDTL | Structured Data Transformation Language (SDTL) is an independent intermediate language for representing data transformation commands. Statistical analysis packages (e.g., SPSS, Stata, SAS, and R) provide similar functionality, but each one has its own proprietary language. SDTL consists of JSON schemas for common operations, such as RECODE, MERGE FILES, and VARIABLE LABELS. SDTL provides machine-actionable descriptions of variable-level data transformation histories derived from any data transformation language. Provenance metadata represented in SDTL can be added to documentation in DDI and other metadata standards. |
|
|
|