Datasets for Supervised Matching in Clean-Clean Entity Resolution
Published October 21, 2022 | Version v3
Dataset Open

Datasets for Supervised Matching in Clean-Clean Entity Resolution

  • 1. University of Athens

Description

The repository includes 13 established datasets for evaluating ML- and DL-based matching algorithms:

  1. Structured DBLP-ACM
  2. Structured DLBLP-Scholar
  3. Structured iTunes-Amazon
  4. Structured Walmart-Amazon
  5. Structured BeerAdvo-RateBeer
  6. Structured Amazon-Google Products
  7. Strucutred Fodors-Zagats
  8. Dirty DBLP-ACM
  9. Dirty DBLP-Scholar
  10. Dirty iTunes-Amazon
  11. Dirty Walmart-Amazon
  12. Textual Abt-Buy
  13. Textual CompanyA-CompanyB

Additionally, the repository includes five new benchmark datasets that are drawn from the following databases using a principled approach based on DeepBlocker:

  1. Abt-Buy
  2. Amazon-Google Products
  3. DBLP-ACM
  4. IMDB-TMDB
  5. IMDB-TVDB
  6. TMDB-TVDB
  7. Walmart-Amazon
  8. DBLP-Google Scholar

The datasets are available in different formats so that they can be processed by the following matching algorithms:

  1. EMTransformer
  2. GNEM
  3. HierMatcher
  4. Magellan
  5. ZeroER

Files

Dn1.zip

Files (651.0 MB)

Name Size Download all
md5:8eae3497432357c91e6fb98e866acc6d
3.8 MB Preview Download
md5:011913c216a562a32911b5c0c9c25c41
8.0 MB Preview Download
md5:5a812df522406fa67522b37da9698593
443.6 kB Preview Download
md5:7b5ed058fa0e8972c89e436a10356c9c
4.1 MB Preview Download
md5:ebb80bbcba16bcbea5d990adb1cd4795
8.2 MB Preview Download
md5:2b12d59c4340dddbd6582777688acd75
1.4 MB Preview Download
md5:ec3f4f1a09aa434b40b266d16799535d
5.5 MB Preview Download
md5:7cb9486bd4b062943bd8d2b093cf7bc8
4.8 MB Preview Download
md5:5fb0bbec3869a9d9ce12e9ba6f3fe461
614.8 MB Download