{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,7]],"date-time":"2024-08-07T23:35:32Z","timestamp":1723073732295},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2021,7]]},"abstract":"Machine learning techniques have been proposed to optimize the databases. For example, traditional empirical database optimization techniques (e.g., cost estimation, join order selection, knob tuning, index and view advisor) cannot meet the high-performance requirement for large-scale database instances, various applications and diversified users, especially on the cloud. Fortunately, machine learning based techniques can alleviate this problem by judiciously selecting optimization strategy. In this tutorial, we categorize database tasks into three typical problems that can be optimized by different machine learning models, including NP-hard problems (e.g., knob space exploration, index\/view selection, partition-key recommendation for offline optimization; query rewrite, join order selection for online optimization), regression problems (e.g., cost\/cardinality estimation, index\/view benefit estimation, query latency prediction), and prediction problems (e.g., query workload prediction). We review existing machine learning based techniques to address these problems and provide research challenges.<\/jats:p>","DOI":"10.14778\/3476311.3476405","type":"journal-article","created":{"date-parts":[[2021,10,28]],"date-time":"2021-10-28T22:48:56Z","timestamp":1635461336000},"page":"3190-3193","source":"Crossref","is-referenced-by-count":20,"title":["Machine learning for databases"],"prefix":"10.14778","volume":"14","author":[{"given":"Guoliang","family":"Li","sequence":"first","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Xuanhe","family":"Zhou","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"given":"Lei","family":"Cao","sequence":"additional","affiliation":[{"name":"MIT"}]}],"member":"320","published-online":{"date-parts":[[2021,10,28]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3064029"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.14778\/3450980.3450992"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397536.3426358"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2903733"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3324957"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389711"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.14778\/3329772.3329780"},{"key":"e_1_2_1_8_1","volume-title":"Diversifying database activity monitoring with bandits. CoRR, abs\/1910.10777","author":"Grushka-Cohen H.","year":"2019"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE51399.2021.00217"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389741"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389704"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.14778\/3384345.3384349"},{"key":"e_1_2_1_13_1","volume-title":"CIDR","author":"Idreos S.","year":"2019"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-07455-9_43"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457546"},{"key":"e_1_2_1_16_1","volume-title":"CIDR","author":"Kipf A.","year":"2019"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407832"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380591"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3412106"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457542"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476380"},{"issue":"2","key":"e_1_2_1_22_1","first-page":"70","article-title":"Xuanyuan: An ai-native database","volume":"42","author":"Li G.","year":"2019","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352129"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196908"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389768"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3211954.3211957"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342644"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342646"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3380579"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3389727"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2742911"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352115"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3329859.3329871"},{"key":"e_1_2_1_34_1","first-page":"126","volume-title":"BLOCKCHAIN","author":"Singh H. J.","year":"2019"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.14778\/3368289.3368296"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452790"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/3339490.3339503"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3300088"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3003665.3003669"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3452830"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/3368289.3368294"},{"key":"e_1_2_1_42_1","first-page":"196","volume-title":"ICDE 2020","author":"Yu X.","year":"2019"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE48307.2020.00133"},{"key":"e_1_2_1_44_1","volume-title":"Buffer pool aware query scheduling via deep reinforcement learning. CoRR, abs\/2007.10568","author":"Zhang C.","year":"2020"},{"key":"e_1_2_1_45_1","volume-title":"VLDBJ","author":"Zhang J.","year":"2021"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3300085"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457291"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2020.2994641"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476311.3476334"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.14778\/3397230.3397238"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3476311.3476405","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:44:58Z","timestamp":1672227898000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3476311.3476405"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7]]},"references-count":50,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2021,7]]}},"alternative-id":["10.14778\/3476311.3476405"],"URL":"https:\/\/doi.org\/10.14778\/3476311.3476405","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2021,7]]}}}