Abstract
Recently, researches on dynamic decision-making based on streaming data are in full swing. As an indispensable technology for data management and analysis, indexing methods are also evolving. The indexing paradigm named learned index has been proposed to replace the traditional B-tree family in some scenarios. It has been proved that learned indexes can provide higher lookup efficiency and lower storage cost overhead than traditional indexes. Usually, learned indexes assume that the data is static or at least the data distribution is unchanged. However, the streaming scenarios break the strong assumption. This paper presents a learned index scheme for streaming scenarios (LIFOSS for short), where the workloads insert, delete, and query data arbitrarily. Precisely, LIFOSS consists of three parts: a) an adaptive packed-memory array which stores data and handles updates with lower bound of performance guaranteed; b) a middle-layer model group, used to fit the cumulative distribution function of data; c) a feedback mechanism designed to update parameters of the model group above in real-time locally. Extensive experiments on two public datasets show that LIFOSS performs better than the state-of-the-art dynamic learned index method. LIFOSS reduces the lookup latency by at least \(6\%\), and its dynamic performance is more stable, requiring only a tiny amount of extra space.
Similar content being viewed by others
References
Bender, M.A., Hu, H.: An adaptive packed-memory array. ACM Trans. Database Syst. 32(4), 26 (2007)
BFerragina, P., HVinciguerra, G.: The pgm-index: a fully-dynamic compressed learned index with provable worst-case bounds. VLDB 13(8), 1162–1175 (2020)
Chen, J., Zhong, M., Li, J., Wang, D., Qian, T., Tu, H.: Effective deep attributed network representation learning with topology adapted smoothing. IEEE Trans Cybern., https://doi.org/10.1109/TCYB.2021.3064092 (2021)
Davitkova, A., Milchevski, E., Michel, S.: The ML-Index: a multidimensional, learned index for point, range, and nearest-neighbor queries. In: EDBT, pp. 407–410 (2020)
Ding, J., Minhas, U.F., Yu, J., Wang, C., Do J., Li, Y., Zhang, H., Chandramouli, B., Gehrke, J., Kossmann, D., Lomet, D.B., Kraska, T., Fu, X., Xu, J., Lu, H.: ALEX: An updatable adaptive learned index. In: SIGMOD, pp. 969–984 (2019)
Ding, J., Nathan, V., Alizadeh, M., Kraska, T.: Tsunami: A learned multi-dimensional index for correlated data and skewed workloads. VLDB 14(2), 74–86 (2020)
Ferragina, P., Lillo, F., Vinciguerra, G.: Why are learned indexes so effective?. In: ICML, pp. 3123–3132 (2020)
Galakatos, A., Markovitch, M., Binnig, C., Fonseca, R., Kraska, T.: FITing-Tree: a data-aware index structure. In: SIGMOD, pp. 1189–1206 (2019)
Gao, Y., Ye, J., Gao, X., Chen, G.: Middle layer based scalable learned index scheme. Journal of Software 31(3), 620–633 (2020)
He, Y., Sick, B.: Clear: An adaptive continual learning framework for regression tasks. arxiv:2101.00926 (2021)
Huszár, R.: On quadratic penalties in elastic weight consolidation. arxiv:1712.03847 (2017)
Kipf, A., Marcus, R., Renen, A.V., Stoian, M., Kemper, A., Kraska, T., Neumann, T.: Sosd: A benchmark for learned indexes. arxiv:1911.13014 (2019)
Kraska, T., Alizadeh, M., Beutel, A., Chi, E.H., Kristo, A., Leclerc, G., Madden, S., Mao, H., Nathan, V.: SageDB: a learned database system. In: CIDR (2019)
Kraska, T., Beutel, A., Chi, E.H., Dean, J., Polyzotis, N.: The case for learned index structures. In: SIGMOD, pp. 489–504 (2018)
Leo, D.D., Boncz, P.A.: Packed memory arrays - rewired. In: ICDE, pp. 830–841 (2019)
Li, J., Cai, T., Ajmal, M., Li, R., Timos, S., Yu, J.: Holistic influence maximization for targeted advertisements in spatial social. In: ICDE, pp. 1340–1343 (2018)
Li, P., Lu, H., Zheng, Q., Yang, L., Pan, G.: LISA: A learned index structure for spatial data. In: SIGMOD, pp. 2119–2133 (2020)
Li, G., Zhou, X., Cao, L.: AI meets database: AI4DB and DB4AI. In: SIGMOD, pp. 2859–2866 (2021)
Li, Z., Wang, X., Li, J., Chen, Y., Zhang, Q.: Deep attributed network representation learning of complex coupling interaction. Knowl. Based Syst 212, 106618 (2021)
Nathan, V., Ding, J., Alizadeh, M., Kraska, T.: Learning multi-dimensional indexes. In: SIGMOD, pp. 985–1000 (2020)
Pandey, V., Renen, A.V., Kipf, A., Ding, J., Sabek, I., Kemper, A.: The case for learned spatial indexes. In: VLDB, pp. 2119–2133 (2020)
Patrick, E., Cheng, E., Gawlick, D., Elizabeth, J.: The log-structured merge-tree (lsm-tree). Acta Informatica 33(4), 351–385 (1996)
Schwarz, J., Czarnecki, W., Luketina, J., Teh, Y.W., Pascanu, R., Hadsell, R.: Progress & Compress: A scalable framework for continual learning. In: ICML, pp. 4535–4544 (2018)
Wang, H., Fu, X., Xu, J., Lu, H.: Learned index for spatial queries. In: MDM, pp. 569–574 (2019)
Wu, J., Zhang, Y., Chen, S., Chen, Y., Wang, J., Xing, C.: Updatable learned index with precise positions. VLDB 14(8), 1276–1288 (2021)
Xue, G., Zhong, M., Li, J., Chen, J., Zhai, C., Kong, R.: Dynamic network embedding survey. arxiv:2103.15447 (2021)
Yang, Y., Guan, Z., Li, J., Zhao, W., Cui, J., Wang, Q.: Interpretable and efficient heterogeneous graph convolutional network. In: TKDE, https://doi.org/10.1109/TKDE.2021.3101356 (2021)
Funding
This work was supported by the Natural Science Foundation of Jiangsu Province, China (Grant numbers BK20211307) and the Major Program of the Natural Science Foundation of Jiangsu Higher Education Institutions of China (Grant Nos. 18KJA520010, 19KJA610002).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
This article belongs to the Topical Collection: Special Issue on Decision Making in Heterogeneous Network Data Scenarios and Applications
Guest Editors: Jianxin Li, Chengfei Liu, Ziyu Guan, and Yinghui Wu
Rights and permissions
About this article
Cite this article
Yu, T., Liu, G., Liu, A. et al. LIFOSS: a learned index scheme for streaming scenarios. World Wide Web 26, 501–518 (2023). https://doi.org/10.1007/s11280-022-01021-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-022-01021-6