Best practice for a loosely-structured catalog · Issue #253 · w3c/dxwg · GitHub
Skip to content

Best practice for a loosely-structured catalog #253

Closed
@jakubklimek

Description

@dr-shorthair raised this in the mailing list:

I’ve been doing some investigations of some local repositories and catalogues, and have uncovered that in many cases ‘datasets’ are ‘just a bag of files’. There is no distinction made between part/whole, distribution (representation), and other kinds of relationship (e.g. documentation, schema, supporting documents). So while the precision we are aiming for in DCAT is clearly valuable in terms of semantics, it is difficult to implement on these legacy systems. Mostly I see people using the Dataset-distribution-> relationship for everything … which is clearly incorrect in many cases. But I doubt if we are unusual in this.

I’m thinking about how to advise on this, while not actually breaking DCAT.
If we made dcat:distribution a sub-property of dct:relation
dcat:distribution rdfs:subPropertyOf dct:relation .

then I think we can have a reasonable recommendation to the simple repositories.
We could tell repositories that use the ‘just a bag of files’ approach to say

 :Dataset987 a dcat:Dataset ;
     dct:relation <file1> , <file2> , <file3> , <file4> , <file5> , <file6> , <file7> … .

which would not be inconsistent with a later reclassification to

  :Dataset987 a dcat:Dataset ;
              dct:hasPart <file1> , <file2> ;
              dcat:distribution <file3> , <file4> ;
              dct:conformsTo <file5> ;
              dct:requires <file6> ;
              dct:references <file7> .  

If this is not all mad, I will add a new use-case - something like ‘Mapping from simple repository model’ – as justification, and propose this tiny enhancement.

I had a few concerns regarding this proposal:

  1. It is not clear to me from the description what exactly the file* IRIs are. If they were actual downloadable files, i.e. something originally linked using dcat:downloadURL, I would disagree with the possibility to allow linking them directly from a dcat:Dataset record, as this would create mess everywhere where a publisher would be a bit lazy to describe the data properly.
  2. Would it be possible to get a few more detailed examples of how this would work?
  3. In my experience, data publishers use the dcat:distribution in a wrong way mainly due to the lack of support for dataset series, which is being resolved in this DCAT revision. When this support is added, publishers will have the possibility of modeling many use cases correctly.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions