Description
@dr-shorthair raised this in the mailing list:
I’ve been doing some investigations of some local repositories and catalogues, and have uncovered that in many cases ‘datasets’ are ‘just a bag of files’. There is no distinction made between part/whole, distribution (representation), and other kinds of relationship (e.g. documentation, schema, supporting documents). So while the precision we are aiming for in DCAT is clearly valuable in terms of semantics, it is difficult to implement on these legacy systems. Mostly I see people using the Dataset-distribution-> relationship for everything … which is clearly incorrect in many cases. But I doubt if we are unusual in this.
I’m thinking about how to advise on this, while not actually breaking DCAT.
If we made dcat:distribution a sub-property of dct:relation
dcat:distribution rdfs:subPropertyOf dct:relation .then I think we can have a reasonable recommendation to the simple repositories.
We could tell repositories that use the ‘just a bag of files’ approach to say
:Dataset987 a dcat:Dataset ;
dct:relation <file1> , <file2> , <file3> , <file4> , <file5> , <file6> , <file7> … .
which would not be inconsistent with a later reclassification to
:Dataset987 a dcat:Dataset ;
dct:hasPart <file1> , <file2> ;
dcat:distribution <file3> , <file4> ;
dct:conformsTo <file5> ;
dct:requires <file6> ;
dct:references <file7> .
If this is not all mad, I will add a new use-case - something like ‘Mapping from simple repository model’ – as justification, and propose this tiny enhancement.
I had a few concerns regarding this proposal:
- It is not clear to me from the description what exactly the file* IRIs are. If they were actual downloadable files, i.e. something originally linked using
dcat:downloadURL
, I would disagree with the possibility to allow linking them directly from adcat:Dataset
record, as this would create mess everywhere where a publisher would be a bit lazy to describe the data properly. - Would it be possible to get a few more detailed examples of how this would work?
- In my experience, data publishers use the
dcat:distribution
in a wrong way mainly due to the lack of support for dataset series, which is being resolved in this DCAT revision. When this support is added, publishers will have the possibility of modeling many use cases correctly.
Activity