Data Lineage

Data Lineage Provides Intelligence on Data Quality and Provenance

While data flow diagrams depict how information flows through a company, data lineage provides additional intelligence on data such as integrations, quality, and processing. This creates a complete lifecycle image of data, where it began and how it transformed into its final form. Data lineage can help organizations with compliance readiness by providing insight into the processes where personal data is used and transformed.

Data lineage is different from data inventory in that lineage focuses on the descriptive information called metadata. Lineage is more concerned with the data lifecycle in how the data is used and transforms through the organization. Data inventory focuses on the state of data at various locations and the statistical aspects of the data (such as risk and classification).

What is Data Lineage Used For?

  • Baseline data lifecycles, how data originates, where it travels, and how it is combined and consumed.
  • Analyze data quality to identify opportunities to improve processes and applications for data accuracy and completeness.
  • For data governance and compliance; organizations can leverage lineage to understand the business processes that consume critical data assets and/or personal information.
  • For business impact assessment; knowing what will be impacted if data content or structure is changed.

What Tools Can Create Data Lineage Reports and Visualizations?

  • Metadata management tools.
  • Data architecture solutions.
  • Data catalogs.
  • Manual processes via surveys and direct data source research and queries.

Data lineage is enriched by having complete and accurate data inventory. Data discovery and classification solutions can provide a foundation to create data lineage, helping populate data catalogs and other tools.

Here are additional resources to learn more about data lineage:

You just learned about Data Lineage, now explore related terms like Data Mapping, Data Flow Diagrams, Personal Data, Data Privacy, and Data Classification.