Generalized UDF for Analytics Inside Database Engine | SpringerLink
Skip to main content

Generalized UDF for Analytics Inside Database Engine

  • Conference paper
Web-Age Information Management (WAIM 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6184))

Included in the following conference series:

Abstract

Running analytics computation inside a database engine through the use of UDFs (User Defined Functions) has been investigated, but not yet become a scalable approach due to several technical limitations. One limitation lies in the lack of generality for UDFs to express complex applications and to compose them with relational operators in SQL queries. Another limitation lies in the lack of systematic support for a UDF to cache relations initially for efficient computation in multi-calls. Further, having UDF execution interacted efficiently with query processing requires detailed system programming, which is often beyond the expertise of most application developers.

To solve these problems, we extend the UDF technology in both semantic and system dimensions. We generalize UDF to support scalar, tuple as well as relation input and output, allow UDFs to be defined on the entire content of relations and allow the moderate-sized input relations to be cached in initially to avoid repeated retrieval. With such extension the generalized UDFs can be composed with other relational operators and thus integrated into queries naturally. Furthermore, based on the notion of invocation patterns, we provide focused system support for efficiently interacting UDF execution with query processing.

We have taken the open-sourced PostgreSQL engine and a commercial and proprietary parallel database engine as our prototyping vehicles; we illustrated the performance, modeling power and usability of the proposed approach with the experimental results on both platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Argyros, T.: How Aster In-Database MapReduce Takes UDF’s to the next Level (2008), http://www.asterdata.com/

  2. Bryant, R.E.: Data-Intensive Supercomputing: The case for DISC, CMU-CS-07-128 (2007)

    Google Scholar 

  3. Chen, Q., Hsu, M.: Data-Continuous SQL Process Model. In: Proc. 16th International Conference on Cooperative Information Systems, CoopIS 2008 (2008)

    Google Scholar 

  4. Chen, Q., Hsu, M., Liu, R., Wang, W.: Scaling-up and Speeding-up Video Analytics Inside Database Engine. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2009. LNCS, vol. 5690, pp. 244–254. Springer, Heidelberg (2009)

    Google Scholar 

  5. Chen, Q., Hsu, M., Liu, R.: Extend UDF Technology for Integrated Analytics. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWak 2009. LNCS, vol. 5691, pp. 256–270. Springer, Heidelberg (2009)

    Google Scholar 

  6. Chen, Q., Therber, A., Hsu, M., Zeller, H., Zhang, B., Wu, R.: Efficiently Support Map-Reduce alike Computation Models Inside Parallel DBMS. In: IDEAS 2009 (2009)

    Google Scholar 

  7. Chen, Q., Hsu, M.: Inter-Enterprise Collaborative Business Process Management. In: Proc. of 17th Int’l Conf. on Data Engineering (ICDE 2001), Germany (2001)

    Google Scholar 

  8. Cooper, B.F., et al.: PNUTS: Yahoo!’s Hosted Data Serving Platform. In: VLDB 2008 (2008)

    Google Scholar 

  9. Dayal, U., Hsu, M., Ladin, R.: A Transaction Model for Long-Running Activities. In: VLDB 1991 (1991) (received 10 years award in 2001)

    Google Scholar 

  10. Dean, J.: Experiences with MapReduce, an abstraction for large-scale computation. In: Int. Conf. on Parallel Architecture and Compilation Techniques. ACM, New York (2006)

    Google Scholar 

  11. DeWitt, D.J., Paulson, E., Robinson, E., Naughton, J., Royalty, J., Shankar, S., Krioukov, A.: Clustera: An Integrated Computation and Data Management System. In: VLDB 2008 (2008)

    Google Scholar 

  12. HDFS, http://hdf.ncsa.uiuc.edu/HDF5/

  13. Jaedicke, M., Mitschang, B.: User-Defined Table Operators: Enhancing Extensibility of ORDBMS. In: VLDB 1999 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hsu, M., Chen, Q., Wu, R., Zhang, B., Zeller, H. (2010). Generalized UDF for Analytics Inside Database Engine. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_70

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14246-8_70

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14245-1

  • Online ISBN: 978-3-642-14246-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics