Stanford Panda Project
Rohit Khare stashed this in Pandas
Stashed in: Stanford
PANDA: A System for Provenance and Data
In its most general form, provenance (also sometimes called lineage) captures where data came from, how it was derived, manipulated, and combined, and how it has been updated over time. Provenance can serve a number of important functions:
• Explanation. Users may be particularly interested in or wary of specific portions of a derived data set. Provenance supports "drilling down" to examine the sources and evolution of data elements of interest, enabling a deeper understanding of the data.
• Verification. Derived data may appear suspect -- due to possible bugs in data processing and manipulation, because the data may be stale, or even due to maliciousness. Provenance enables auditing how data was produced, either for verifying its correctness, or for identifying the erroneous or outdated source data or processing nodes that are responsible for erroneous or outdated output data.
• Recomputation. Having found outdated or incorrect source data, or buggy processing nodes, we may want to correct the errors and propagate the corrections forward to all "downstream" data that are affected. Provenance helps us recompute only those data elements that are affected by the corrections.