TY - JOUR AU - Cousseau, Vinícius M. R. AU - Barbosa, Luciano PY - 2019 TI - Industrial Paper: Large-scale Record Linkage of Web-based Place Entities JF - Anais do Simpósio Brasileiro de Banco de Dados (SBBD); 2019: Anais do XXXIV Simpósio Brasileiro de Banco de Dados DO - 10.5753/sbbd.2019.8820 KW - N2 - Extracting data about entities from the Web has become commonplace in the industry and academia alike. Web-based entities, however, are inherently noisy and, as such, introduce several normalization issues which must be attended to in order to maintain a clean database. Record linkage, which refers to the detection of replicated datum from possibly multiple sources, is one of the most critical of those issues. This paper presents a practical approach for solving the record linkage problem in the places data domain at an industrial scale, displaying both a model which reaches a normalized Gini coefficient of 0.92, and an architecture that supports large-scale processing. UR - https://sol.sbc.org.br/index.php/sbbd/article/view/8820