LIFOSS: a learned index scheme for streaming scenarios | World Wide Web Skip to main content
Log in

LIFOSS: a learned index scheme for streaming scenarios

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Recently, researches on dynamic decision-making based on streaming data are in full swing. As an indispensable technology for data management and analysis, indexing methods are also evolving. The indexing paradigm named learned index has been proposed to replace the traditional B-tree family in some scenarios. It has been proved that learned indexes can provide higher lookup efficiency and lower storage cost overhead than traditional indexes. Usually, learned indexes assume that the data is static or at least the data distribution is unchanged. However, the streaming scenarios break the strong assumption. This paper presents a learned index scheme for streaming scenarios (LIFOSS for short), where the workloads insert, delete, and query data arbitrarily. Precisely, LIFOSS consists of three parts: a) an adaptive packed-memory array which stores data and handles updates with lower bound of performance guaranteed; b) a middle-layer model group, used to fit the cumulative distribution function of data; c) a feedback mechanism designed to update parameters of the model group above in real-time locally. Extensive experiments on two public datasets show that LIFOSS performs better than the state-of-the-art dynamic learned index method. LIFOSS reduces the lookup latency by at least \(6\%\), and its dynamic performance is more stable, requiring only a tiny amount of extra space.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bender, M.A., Hu, H.: An adaptive packed-memory array. ACM Trans. Database Syst. 32(4), 26 (2007)

    Article  Google Scholar 

  2. BFerragina, P., HVinciguerra, G.: The pgm-index: a fully-dynamic compressed learned index with provable worst-case bounds. VLDB 13(8), 1162–1175 (2020)

    Google Scholar 

  3. Chen, J., Zhong, M., Li, J., Wang, D., Qian, T., Tu, H.: Effective deep attributed network representation learning with topology adapted smoothing. IEEE Trans Cybern., https://doi.org/10.1109/TCYB.2021.3064092 (2021)

  4. Davitkova, A., Milchevski, E., Michel, S.: The ML-Index: a multidimensional, learned index for point, range, and nearest-neighbor queries. In: EDBT, pp. 407–410 (2020)

  5. Ding, J., Minhas, U.F., Yu, J., Wang, C., Do J., Li, Y., Zhang, H., Chandramouli, B., Gehrke, J., Kossmann, D., Lomet, D.B., Kraska, T., Fu, X., Xu, J., Lu, H.: ALEX: An updatable adaptive learned index. In: SIGMOD, pp. 969–984 (2019)

  6. Ding, J., Nathan, V., Alizadeh, M., Kraska, T.: Tsunami: A learned multi-dimensional index for correlated data and skewed workloads. VLDB 14(2), 74–86 (2020)

    Google Scholar 

  7. Ferragina, P., Lillo, F., Vinciguerra, G.: Why are learned indexes so effective?. In: ICML, pp. 3123–3132 (2020)

  8. Galakatos, A., Markovitch, M., Binnig, C., Fonseca, R., Kraska, T.: FITing-Tree: a data-aware index structure. In: SIGMOD, pp. 1189–1206 (2019)

  9. Gao, Y., Ye, J., Gao, X., Chen, G.: Middle layer based scalable learned index scheme. Journal of Software 31(3), 620–633 (2020)

    Google Scholar 

  10. He, Y., Sick, B.: Clear: An adaptive continual learning framework for regression tasks. arxiv:2101.00926 (2021)

  11. Huszár, R.: On quadratic penalties in elastic weight consolidation. arxiv:1712.03847 (2017)

  12. Kipf, A., Marcus, R., Renen, A.V., Stoian, M., Kemper, A., Kraska, T., Neumann, T.: Sosd: A benchmark for learned indexes. arxiv:1911.13014 (2019)

  13. Kraska, T., Alizadeh, M., Beutel, A., Chi, E.H., Kristo, A., Leclerc, G., Madden, S., Mao, H., Nathan, V.: SageDB: a learned database system. In: CIDR (2019)

  14. Kraska, T., Beutel, A., Chi, E.H., Dean, J., Polyzotis, N.: The case for learned index structures. In: SIGMOD, pp. 489–504 (2018)

  15. Leo, D.D., Boncz, P.A.: Packed memory arrays - rewired. In: ICDE, pp. 830–841 (2019)

  16. Li, J., Cai, T., Ajmal, M., Li, R., Timos, S., Yu, J.: Holistic influence maximization for targeted advertisements in spatial social. In: ICDE, pp. 1340–1343 (2018)

  17. Li, P., Lu, H., Zheng, Q., Yang, L., Pan, G.: LISA: A learned index structure for spatial data. In: SIGMOD, pp. 2119–2133 (2020)

  18. Li, G., Zhou, X., Cao, L.: AI meets database: AI4DB and DB4AI. In: SIGMOD, pp. 2859–2866 (2021)

  19. Li, Z., Wang, X., Li, J., Chen, Y., Zhang, Q.: Deep attributed network representation learning of complex coupling interaction. Knowl. Based Syst 212, 106618 (2021)

    Article  Google Scholar 

  20. Nathan, V., Ding, J., Alizadeh, M., Kraska, T.: Learning multi-dimensional indexes. In: SIGMOD, pp. 985–1000 (2020)

  21. Pandey, V., Renen, A.V., Kipf, A., Ding, J., Sabek, I., Kemper, A.: The case for learned spatial indexes. In: VLDB, pp. 2119–2133 (2020)

  22. Patrick, E., Cheng, E., Gawlick, D., Elizabeth, J.: The log-structured merge-tree (lsm-tree). Acta Informatica 33(4), 351–385 (1996)

    Article  MATH  Google Scholar 

  23. Schwarz, J., Czarnecki, W., Luketina, J., Teh, Y.W., Pascanu, R., Hadsell, R.: Progress & Compress: A scalable framework for continual learning. In: ICML, pp. 4535–4544 (2018)

  24. Wang, H., Fu, X., Xu, J., Lu, H.: Learned index for spatial queries. In: MDM, pp. 569–574 (2019)

  25. Wu, J., Zhang, Y., Chen, S., Chen, Y., Wang, J., Xing, C.: Updatable learned index with precise positions. VLDB 14(8), 1276–1288 (2021)

    Google Scholar 

  26. Xue, G., Zhong, M., Li, J., Chen, J., Zhai, C., Kong, R.: Dynamic network embedding survey. arxiv:2103.15447 (2021)

  27. Yang, Y., Guan, Z., Li, J., Zhao, W., Cui, J., Wang, Q.: Interpretable and efficient heterogeneous graph convolutional network. In: TKDE, https://doi.org/10.1109/TKDE.2021.3101356 (2021)

Download references

Funding

This work was supported by the Natural Science Foundation of Jiangsu Province, China (Grant numbers BK20211307) and the Major Program of the Natural Science Foundation of Jiangsu Higher Education Institutions of China (Grant Nos. 18KJA520010, 19KJA610002).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to An Liu.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

This article belongs to the Topical Collection: Special Issue on Decision Making in Heterogeneous Network Data Scenarios and Applications

Guest Editors: Jianxin Li, Chengfei Liu, Ziyu Guan, and Yinghui Wu

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, T., Liu, G., Liu, A. et al. LIFOSS: a learned index scheme for streaming scenarios. World Wide Web 26, 501–518 (2023). https://doi.org/10.1007/s11280-022-01021-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-022-01021-6

Keywords

Navigation