Abstract
Linking historical census data across time is a challenging task due to various reasons, including data quality, limited individual information, and changes to households over time. Although most census data linking methods link records that correspond to individual household members, recent advances show that linking households as a whole provide more accurate results and less multiple household links. In this paper, we introduce a graph-based method to link households, which takes the structural relationship between household members into consideration. Based on individual record linking results, our method builds a graph for each household, so that the matches are determined by both attribute-level and record-relationship similarity. Our experimental results on both synthetic and real historical census data have validated the effectiveness of this method. The proposed method achieves an F-measure of 0.937 on data extracted from real UK census datasets, outperforming all alternative methods being compared.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bishop, C.: Pattern Recognition and Machine Learning. Springer (2006)
Bloothooft, G.: Multi-source family reconstruction. History and Computing 7(2), 90–103 (1995)
Caetano, T., McAuley, J., Cheng, L., Le, Q.V., Smola, A.: Learning graph matching. IEEE TPAMI 31(6), 1048–1058 (2009)
Christen, P.: Febrl: an open source data cleaning, deduplication and record linkage system with a graphical user interface. In: ACM KDD, Las Vegas, pp. 1065–1068 (2008)
Christen, P.: Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer (2012)
Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence 89, 31–71 (1997)
Domingos, P.: Multi-relational record linkage. In: KDD Workshop, pp. 31–48 (2004)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: A survey. IEEE TKDE 19(1), 1–16 (2007)
Fu, Z., Christen, P., Boot, M.: Automatic cleaning and linking of historical census data using household information. In: IEEE ICDM Workshop, pp. 413–420 (2011)
Fu, Z., Zhou, J., Christen, P., Boot, M.: Multiple instance learning for group record linkage. In: Tan, P.-N., Chawla, S., Ho, C.K., Bailey, J. (eds.) PAKDD 2012, Part I. LNCS, vol. 7301, pp. 171–182. Springer, Heidelberg (2012)
Fure, E.: Interactive record linkage: The cumulative construction of life courses. Demographic Research 3, 11 (2000)
Hall, R., Fienberg, S.: Valid statistical inference on automatically matched files. In: Domingo-Ferrer, J., Tinnirello, I. (eds.) PSD 2012. LNCS, vol. 7556, pp. 131–142. Springer, Heidelberg (2012)
Hosmer, D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, 3rd edn. Wiley (2013)
Munkres, J.: Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics 5(1), 32–38 (1957)
Nuray-Turan, R., Kalashnikov, D.V., Mehrotra, S.: Self-tuning in graph-based reference disambiguation. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 325–336. Springer, Heidelberg (2007)
On, B.W., Koudas, N., Lee, D., Srivastava, D.: Group linkage. In: IEEE ICDE, Istanbul, Turkey, pp. 496–505 (2007)
Quass, D., Starkey, P.: Record linkage for genealogical databases. In: ACM KDD Workshop, Washington, DC, pp. 40–42 (2003)
Ravikumar, P., Cohen, W.W.: A hierarchical graphical model for record linkage. In: UAI, pp. 454–461 (2004)
Ruggles, S.: Linking historical censuses: a new approach. History and Computing 14(1+2), 213–224 (2006)
Sadinle, M., Fienberg, S.: A generalized Fellegi-Sunter framework for multiple record linkage with application to homicide record systems. Journal of the American Statistical Association 108(502), 385–397 (2013)
Zager, L., Verghese, G.: Graph similarity scoring and matching. Applied Mathematics Letters 21(1), 86–94 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Fu, Z., Christen, P., Zhou, J. (2014). A Graph Matching Method for Historical Census Household Linkage. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8443. Springer, Cham. https://doi.org/10.1007/978-3-319-06608-0_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-06608-0_40
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06607-3
Online ISBN: 978-3-319-06608-0
eBook Packages: Computer ScienceComputer Science (R0)