TY - JOUR
AU - Cousseau, Vinícius M. R.
AU - Barbosa, Luciano
PY - 2019
TI - Industrial Paper: Large-scale Record Linkage of Web-based Place Entities
JF - Anais do Simpósio Brasileiro de Banco de Dados (SBBD); 2019: Anais do XXXIV Simpósio Brasileiro de Banco de Dados
DO - 10.5753/sbbd.2019.8820
KW -
N2 - Extracting data about entities from the Web has become commonplace in the industry and academia alike. Web-based entities, however, are inherently noisy and, as such, introduce several normalization issues which must be attended to in order to maintain a clean database. Record linkage, which refers to the detection of replicated datum from possibly multiple sources, is one of the most critical of those issues. This paper presents a practical approach for solving the record linkage problem in the places data domain at an industrial scale, displaying both a model which reaches a normalized Gini coefficient of 0.92, and an architecture that supports large-scale processing.
UR - https://sol.sbc.org.br/index.php/sbbd/article/view/8820