Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do you make UIDs in ocd-id's consistent? #300

Open
jamesa opened this issue Apr 5, 2022 · 1 comment
Open

How do you make UIDs in ocd-id's consistent? #300

jamesa opened this issue Apr 5, 2022 · 1 comment

Comments

@jamesa
Copy link

jamesa commented Apr 5, 2022

Hi, I'm a beginner just getting familiar with your project. I'm mostly interested in your US data, and I had a question on the usage of UIDs in OCD ID's.

I understand this repository holds the canonical OCD ID's for many jurisdictions, but wasn't clear on if there's guidance on whether a bill, or event, could have a canonical ID.

Example

For instance, from the datamade API referenced in your docs, I can get this response from https://ocd.datamade.us/bills/?page=3 (currently)

{
  "results": [
    {
      "classification": [
        "ordinance"
      ],
      "id": "ocd-bill/45b448a4-86f0-4fae-8311-6cf958cf1557",
      "title": "Grant(s) of privilege in public way for Dream, Inc.",
      "subject": [
        "Grants of Privilege"
      ],
      "identifier": "O2020-3422",
      "from_organization": {
        "jurisdiction": {
          "id": "ocd-jurisdiction/country:us/state:il/place:chicago/government",
          "name": "Chicago City Government"
        },
        "id": "ocd-organization/ef168607-9135-4177-ad8e-c1f7a4806c3a",
        "name": "Chicago City Council"
      },
      "updated_at": "2020-07-22T23:51:15.477432+00:00"
    },
[...]

Using that identifier O2020-3422 I can find that bill in the Chicago Legistar. I searched around within Legistar but I couldn't find something that matched the 45b448a4-86f0-4fae-8311-6cf958cf1557 ID to use as a reference.

If I were writing my own scraper, how would I ensure that my representation of the bill in this example, in terms of its generated OCD ID, remains consistent with the one returned from the datamade API? How do I generate that same ID independently of that API?

Along the same lines, if I were to publish some event not tracked by that API, but datamade later scraped the same event, I would want to make sure we ended up with the same generated ID.

The same goes for every other data type that uses UID's (events, organizations, people, votes). Is this up for each implementation to decide, if there's not a canonical ID?

I'm probably missing some behavior that determines this in one of the scraper repos, but I'd appreciate any guidance you can provide on this. Thank you!

@showerst
Copy link
Contributor

@jamesa -- I just stumbled on this, and FYI those IDs are managed by another project. They're generated by scrapers that use opencivicdata/python-opencivicdata as a base. The UID code is in the models, heres the bill code as you can see it's currently using a UUID so there's no good way to generate consistent entries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants