Skip to content

feat: source portal DataStore fallback for harvested datasets #20

@aborruso

Description

@aborruso

Problem

When a dataset is harvested by an aggregator portal (e.g. dati.gov.it), the ckan_list_resources tool reports DataStore: No even when the DataStore is active on the source portal.

Real example

Querying "Rilevazione qualità aria 2025" (Comune di Milano) on dati.gov.it:

  • ckan_list_resourcesDataStore: No
  • Same dataset on dati.comune.milano.itDataStore: Yes

The resource download URL already contains the source portal domain, dataset ID, and resource ID:

https://dati.comune.milano.it/dataset/f02f7e96.../resource/9010ac16.../download/...

So the information needed to check the source portal is already available in the metadata.

Proposed Solution

When datastore_active is false/null on a resource, inspect the download URL. If the domain differs from server_url, attempt a DataStore lookup on the source portal using the extracted resource ID, and report the result alongside the original resource metadata.

New optional parameter check_source_portal (boolean, default true) on ckan_list_resources.

New fields in response: source_datastore_active, source_portal_url.

Spec

Full proposal and scenarios in OpenSpec change:
openspec/changes/add-source-portal-datastore-fallback/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions