Published October 21, 2022
| Version v3
Dataset
Open
Datasets for Supervised Matching in Clean-Clean Entity Resolution
Description
The repository includes 13 established datasets for evaluating ML- and DL-based matching algorithms:
- Structured DBLP-ACM
- Structured DLBLP-Scholar
- Structured iTunes-Amazon
- Structured Walmart-Amazon
- Structured BeerAdvo-RateBeer
- Structured Amazon-Google Products
- Strucutred Fodors-Zagats
- Dirty DBLP-ACM
- Dirty DBLP-Scholar
- Dirty iTunes-Amazon
- Dirty Walmart-Amazon
- Textual Abt-Buy
- Textual CompanyA-CompanyB
Additionally, the repository includes five new benchmark datasets that are drawn from the following databases using a principled approach based on DeepBlocker:
- Abt-Buy
- Amazon-Google Products
- DBLP-ACM
- IMDB-TMDB
- IMDB-TVDB
- TMDB-TVDB
- Walmart-Amazon
- DBLP-Google Scholar
The datasets are available in different formats so that they can be processed by the following matching algorithms:
- EMTransformer
- GNEM
- HierMatcher
- Magellan
- ZeroER
Files
Dn1.zip
Files
(651.0 MB)
Name | Size | Download all |
---|---|---|
md5:8eae3497432357c91e6fb98e866acc6d
|
3.8 MB | Preview Download |
md5:011913c216a562a32911b5c0c9c25c41
|
8.0 MB | Preview Download |
md5:5a812df522406fa67522b37da9698593
|
443.6 kB | Preview Download |
md5:7b5ed058fa0e8972c89e436a10356c9c
|
4.1 MB | Preview Download |
md5:ebb80bbcba16bcbea5d990adb1cd4795
|
8.2 MB | Preview Download |
md5:2b12d59c4340dddbd6582777688acd75
|
1.4 MB | Preview Download |
md5:ec3f4f1a09aa434b40b266d16799535d
|
5.5 MB | Preview Download |
md5:7cb9486bd4b062943bd8d2b093cf7bc8
|
4.8 MB | Preview Download |
md5:5fb0bbec3869a9d9ce12e9ba6f3fe461
|
614.8 MB | Download |