Provenance in a Modifiable Data Set | SpringerLink
Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8000))

  • 1242 Accesses

Abstract

Provenance of data is now widely recognized as being of great importance, thanks in large part to pioneering work [4, 6] by Peter Buneman and his collaborators in a stream that continues to produce influential papers today [1-3, 7]. When we consume data from a database, we often care about where these data come from, how they were derived, and so forth. We may desire answers to such questions to establish trust in the data, to investigate suspicious values, to debug code in the system, or for a host of other reasons. Considerable recent work has addressed many issues related to provenance. However, the standard assumption is that data sources, from which result data have been derived, are static. In reality, we know that most data are modified over time, including data sources used for deriving results of interest. When we consider provenance in the context of such modifications, many new problems arise. This chapter addresses two key problems in this context:

  1. 1

    Result data may no longer be valid after a source update. How can we efficiently determine whether a given result tuple is valid? When a result tuple is invalidated, can we explain what caused this invalidation?

  2. 2

    We may have lost access to (some) source data. In such a situation, can we determine what is the missing source data on which some result tuple depends?

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Buneman, P., Chapman, A., Cheney, J.: Provenance management in curated databases. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 539–550 (2006)

    Google Scholar 

  2. Buneman, P., Cheney, J., Lindley, S., Müller, H.: Dbwiki: A structured wiki for curated data and collaborative data management. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 1335–1338 (2011)

    Google Scholar 

  3. Buneman, P., Cheney, J., Tan, W.-C., Vansummeren, S.: Curated databases. In: Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 1–12 (2008)

    Google Scholar 

  4. Buneman, P., Khanna, S., Tan, W.C.: Data provenance: Some basic issues. In: Foundations of Software Technology and Theoretical Computer Science, pp. 87–93 (2000)

    Google Scholar 

  5. Buneman, P., Khanna, S., Tajima, K., Tan, W.C.: Archiving scientific data. ACM Trans. Database Syst. 29, 2–42 (2004)

    Article  Google Scholar 

  6. Buneman, P., Khanna, S., Tan, W.-C.: Why and where: A characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  7. Buneman, P., Tan, W.-C.: Provenance in databases. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 1171–1173 (2007)

    Google Scholar 

  8. Chapman, A., Jagadish, H.V.: Why not? In: Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 523–534 (2009)

    Google Scholar 

  9. Cui, Y., Widom, J.: Practical lineage tracing in data warehouses. In: Proceedings of the 15th International Conference on Data Engineering, pp. 367–378 (1999)

    Google Scholar 

  10. Green, T.J., Karvounarakis, G., Ives, Z.G., Tannen, V.: Update exchange with mappings and provenance. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 675–686 (2007)

    Google Scholar 

  11. Gupta, A., Mumick, I.S., Subrahmanian, V.S.: Maintaining views incrementally. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 157–166 (1993)

    Google Scholar 

  12. Herschel, M., Hernández, M.A.: Explaining missing answers to spjua queries. Proc. VLDB Endow. 3, 185–196 (2010)

    Google Scholar 

  13. Huang, J., Chen, T., Doan, A., Naughton, J.F.: On the provenance of non-answers to queries over extracted data. Proc. VLDB Endow. 1(1), 736–747 (2008)

    Google Scholar 

  14. Meliou, A., Gatterbauer, W., Moore, K.F., Suciu, D.: Why so? or why no? functional causality for explaining query answers. In: CoRR (2009)

    Google Scholar 

  15. Müller, H., Buneman, P., Koltsidas, I.: Xarch: Archiving scientific and reference data. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1295–1298 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Zhang, J., Jagadish, H.V. (2013). Provenance in a Modifiable Data Set. In: Tannen, V., Wong, L., Libkin, L., Fan, W., Tan, WC., Fourman, M. (eds) In Search of Elegance in the Theory and Practice of Computation. Lecture Notes in Computer Science, vol 8000. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41660-6_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41660-6_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41659-0

  • Online ISBN: 978-3-642-41660-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics