Skip to content

Lineage

Lineage tracks data provenance through the pipeline.

Why Lineage?

You cannot reason about outputs unless you can reason about inputs.

Lineage provides:

  • Auditability
  • Reproducibility
  • Debugging
  • Compliance

Lineage Message

protobuf
message Lineage {
  string id = 1;
  string artifact_id = 2;
  string artifact_type = 3;
  Source source = 4;
  repeated Transformation transformations = 5;
}

message Source {
  string source_type = 1;
  string source_id = 2;
  string source_url = 3;
  string content_hash = 4;
}

message Transformation {
  string name = 1;
  string version = 2;
  repeated string input_ids = 3;
}

Example

json
{
  "id": "lineage:evt:123",
  "artifact_id": "sec:form4:abc",
  "source": {
    "source_type": "SEC_EDGAR",
    "source_url": "https://sec.gov/...",
    "content_hash": "sha256:a1b2c3"
  },
  "transformations": [
    {
      "name": "parse_form4",
      "version": "1.0.0",
      "input_ids": ["raw:sec:abc"]
    }
  ]
}

Use Cases

  • Trace data back to source
  • Identify affected outputs when sources change
  • Prove data provenance for audits

Built for traders who value data provenance.