Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record and display whether IATI identifier has a matching prefix in org-id lists #6

Open
BobHarper1 opened this issue Jan 30, 2018 · 23 comments
Assignees

Comments

@BobHarper1
Copy link

BobHarper1 commented Jan 30, 2018

The namespace code part of organisation identifiers should be in the org-id register of lists.

This helps in answering the question:

which organisation does NL-KVK-41198677 refer to?

where having direct links to the registering agency's entry would assist traceability.

It would also help to be able to identify where a namespace does not match an existing list code,so that either an incorrect IATI identifier could be fixed,or a missing list added to org-id

@andylolz
Copy link
Member

andylolz commented Jan 31, 2018

either an incorrect IATI identifier could be fixed

Am I right in thinking it’s only incorrect if the IATI org file is v2.0x?

@BobHarper1
Copy link
Author

BobHarper1 commented Jan 31, 2018

I think it is in 1.05 but as iati-identifer (changed to organisation-identifier in 2.01)? http://iatistandard.org/105/organisation-standard/iati-organisations/iati-organisation/iati-identifier/
http://iatistandard.org/105/organisation-identifiers/

@andylolz
Copy link
Member

andylolz commented Jan 31, 2018

Sorry, what I mean is:

If you compare the description of the @ref attribute at v1.05 with the description at v2.01, you’ll see that at v2.01, there is a “MUST”, with a description of (although unfortunately no direct mention of) the organisation identifier format used by http://org-id.guide:

Machine-readable identification string for the organisation issuing the report. Must be in the format {RegistrationAgency}-{RegistrationNumber} where {RegistrationAgency} is a valid code in the RegistrationAgency code list and {RegistrationNumber } is a valid identifier issued by the {RegistrationAgency}

I think it is in 1.05 but as iati-identifer (changed to organisation-identifier in 2.01)?

Sorry, yes – the same applies for those two. organisation-identifier includes the “MUST” text, iati-identifier doesn’t.

So for instance, this is not an incorrect organisation identifier:
https://andylolz.github.io/org-id-finder/#46004

…because the file it’s declared in is v1.0x.

I think that’s right? Hmm – It seems like it must be wrong…!

@BobHarper1
Copy link
Author

Seems like, but yes I think you are right!

Ok, so my proposal

where having direct links to the registering agency's entry would assist traceability.

It would also help to be able to identify where a namespace does not match an existing list code,so that either an incorrect IATI identifier could be fixed,or a missing list added to org-id

Could that work by first determining the version that the identifier was published (adding a field for /iati-organisations/@version to the scraping process)?

  • Any identifier published in v2.0x should have the RegistrationAgency code, which can be linked to the org-id.guide entry (and flagged as missing if there is no match).
  • Any identifier published in version before 2 might have a matching RegistrationAgency code, which could be linked (but not flagged if there is no match).

@andylolz
Copy link
Member

andylolz commented Jan 31, 2018

So my initial reaction was “ooh, this is cool!” but I thought about it for a bit and had a few issues.

  1. This is a good idea, but it should be something the IATI registry refresher does. That’s because it seems to me like it’s between the registry and the publisher to resolve. People using these org IDs (who this tool is for) mostly shouldn’t need to care.
  2. I don’t see how this assists traceability, but perhaps that’s just me being dense. Not in the sense that I mean traceability, anyway, as described in the footnote in the README.
  3. “direct links to the registering agency's entry” does sound cool… But I don’t think org-id.guide gives me that (unless I’ve misunderstood, and we mean different things?) In the case of the example above, for instance, this is probably the right link. But org-id.guide only gets me to here.

I don’t mean to shut this issue down, and I’m still interested… I just have some reservations, so I thought I’d note them down.

@BobHarper1
Copy link
Author

A key element of traceability (I would argue) is consistent approaches to identifier creation, so I think understanding the provenance of an identifier is important here. I can search a name (e.g. 'Hivos'), find its IATI identifier through org-id-finder, and know that 41198677 that should match an organisation on the NL-KVK, ergo, it is that organisation, and no other.

Plus, since the same process of identifier creation is extensible to other standards, then that helps too?

Re 3. Ah, I meant link to the relevant list's entry on the org-id site, rather than the agency's website (the purpose being that you have the information to find the right page ultimately, even if the follow-through link turns out to be dead... but I can fix that one now!).

@andylolz
Copy link
Member

andylolz commented Feb 1, 2018

consistent approaches to identifier creation

If IATI publishers followed a consistent (i.e. reproducible) approach to identifier creation, I’m not sure this project would need to exist! Users would be able to figure out org IDs directly, by following the org-id.guide guidelines to reproduce.

This project instead shows the self-declared organisation identifiers – the provenance for which (the publisher’s IATI organisation file) is linked in the source dropdown. In IATI-land, it seems these identifiers sometimes fail to follow the consistent approach outlined by org-id.guide.


But can I ask… Which list would we validate against? The one on org-id.guide, or the codelist in the standard? The former, right? (The latter is legacy I think?)

@andylolz
Copy link
Member

andylolz commented Feb 5, 2018

Aha – I think I finally understand!

@timgdavies suggests that even if a publisher self-declares an org ID, if it doesn’t conform to the org-id.guide format, an org-id.guide identifier should be (generated and) preferred.

If that’s the case, I’m happy to do these checks, and provide the recommended, consistent identifier (probably with some accompanying explanation).

@andylolz andylolz self-assigned this Feb 5, 2018
@timgdavies
Copy link

But can I ask… Which list would we validate against? The one on org-id.guide, or the codelist in the standard? The former, right? (The latter is legacy I think?)

The org-id.guide one. I thought that the list on that page was supposed to be being kept in sync with org-id.guide's XML output (which mirrors it's structure), but it seems that is not happening.

@andylolz
Copy link
Member

andylolz commented Feb 11, 2018

@BobHarper1 @timgdavies I’ve added the alternative, recommended org ID to this gist:
https://gist.github.com/andylolz/d16c35f190e2f3e8f4112cfa6728a8f3

Let me know if those look right / if I’ve missed any. It doesn’t find one for e.g. US-18 because it has to do a country name lookup to figure out the correct DAC donor code, and that bit goes wrong for countries where the DAC uses an abbreviated name (e.g. “United States” instead of “United States of America”).

If we’re happy with the recommendation algorithm, I’ll add it on the frontend. Btw the algorithm it uses is here.

To summarise the algorithm: Where value is a given org ID in an IATI organisation file…

  • If value uses the org-id.guide format ([^-]+-[^-]+-.+) and the prefix is on the list of lists, return it
  • If value uses the org-id.guide format and the prefix (when uppercased) is on the list of lists, return it with the prefix uppercased
  • If value looks like a DAC channel code (\d{5}), and is on the DAC channel code list, add the prefix XM-DAC- and return it
  • If value uses a format like AU-5 ([A-Z]{2}-\d+):
    • split it into {country code} and {agency code}
    • look up the country code to get a country name
    • look up the country name on the DAC donor list to get a DAC donor code
    • return XM-DAC-{DAC donor code}-{agency code} (in the case of AU-5, this would be XM-DAC-801-5)
  • If value is on IATIOrganisationIdentifier but is missing its XI-IATI- prefix, add the prefix
  • Give up (return None)

@andylolz
Copy link
Member

andylolz commented Feb 12, 2018

in the case of AU-5, this would be XM-DAC-801-5

^^ This step appears to be wrong… I’m not sure how it should work. Any clues?

@stevieflow
Copy link

My understanding of AU-5 - Australian Agency for International Development is that the answer is on the OECD DAC Agency sheet

AU = 801
Australian Agency for International Development (from IATI list) is not actually named on the current EOCD DAC agency list)

Perhaps another way around this is just to convert the current OECD-DAC agency list with the XM-DAC prefix, and use that as the souce (in other words, forget the original, yet outdated, IATI list)?

@stevieflow
Copy link

stevieflow commented Feb 12, 2018

This step appears to be wrong… I’m not sure how it should work. Any clues?

I dont see the step as being wrong. Just that this agency is now not on the list

One other factor - DFID would be XM-DAC-12-1 from this list, but we know DFID are GB-GOV-1 (from their own reporting-org and org file )

@andylolz
Copy link
Member

andylolz commented Feb 12, 2018

I dont see the step as being wrong. Just that this agency is now not on the list

Sorry – I should have given more details. I concluded it must be wrong because Netherlands use XM-DAC-7 rather than XM-DAC-7-1. I’m not sure if that’s a mistake on my part, or theirs.

One other factor - DFID would be XM-DAC-12-1 from this list, but we know DFID are GB-GOV-1 (from their own reporting-org and org file )

Yeah – If they’re using something that looks like a valid org ID (e.g. GB-GOV-1) then the algorithm returns that. I.e. it won’t keep trying to find a better solution. See line 91 herevalid_org_id is True, and there’s no suggested_org_id.

(in other words, forget the original, yet outdated, IATI list)?

Yeah exactly – the algorithm proposed above doesn’t touch the OrganisationIdentifier list at all. It uses the DAC codelists directly (although it uses the donor list rather than the agency list… Perhaps that’s wrong.) It should probably also use the XML that the DAC now provide, but at the moment it uses this datahub dataset.

@stevieflow
Copy link

Sorry – I should have given more details. I concluded it must be wrong because Netherlands use XM-DAC-7 rather than XM-DAC-7-1. I’m not sure if that’s a mistake on my part, or theirs.

I can ask - I think it should be XM-DAC-7-1

@andylolz
Copy link
Member

andylolz commented Feb 15, 2018

I can ask - I think it should be XM-DAC-7-1

Thanks!

There’s also Switzerland SDC (XM-DAC-CH-4) and Gates (XM-DAC-DAC-1601). There are probably more examples to boot!

@andylolz
Copy link
Member

@BobHarper1 (but cc @stevieflow @timgdavies) it would be great if you could have a scan through https://gist.github.com/andylolz/d16c35f190e2f3e8f4112cfa6728a8f3 and check:

  1. the stuff that’s in the suggested_org_id column looks correct, and
  2. whether there’s anything missing from the suggested_org_id column

Once that’s approved, I can figure out how to report this on the front end, and then close up this issue. Thanks!

@stevieflow
Copy link

Thanks @andylolz
I can see

  • the case-senstivity example: Dutch KvK -> KVK
  • instances of the numeric DAC channel code : 46004 --> XM-DAC-46004
  • instances of donors that have country code etc : AU-5 --> XM-DAC-801-5

These seem all good for suggestions. Pinging @markbrough (re: https://discuss.iatistandard.org/t/why-does-2-02-include-a-code-list-that-was-not-supported-since-1-04/1101/9)

One example that might not be so useful, but fits the logic is : IADB --> XI-IATI-IADB

@andylolz
Copy link
Member

andylolz commented Feb 28, 2018

Super useful – thanks @stevieflow!

These seem all good for suggestions.

Yeah, totally agree with the emphasis here. So I’m thinking I will still present the self-declared identifier in the same way, but just add something like a tooltip or an extra note somehow that says:

Psst… while this is the self-declared identifier… it doesn’t actually conform to the new methodology.
Here’s the one that does.
Maybe you could give them a nudge and let them know?
Kk thx.

…or thereabouts. What do you think?


One example that might not be so useful, but fits the logic is : IADB --> XI-IATI-IADB

Ooh, interesting! Pretend I know nothing (because I actually know nothing). Why wouldn’t that one be useful? (Perhaps I shouldn’t be generating XI-IATI identifiers? I could take that bit out if it’s not a good idea.)

@stevieflow
Copy link

Re: XI-IATI - I'd agree to take these suggestions out, as this is the last resort.

But, I now realise that you might not be doing what I thought !

I thought you were suggesting XI-IATI-IADB because you'd found IADB. When we look at the OECD DAC list, there's a listing, but they seem to share that with others:

46012 | 2016 | IDB | Inter-American Development Bank, Inter-American Investment Corporation and Multilateral Investment Fund

Therefore, instead of XM-DAC-46102 I think there is a reason why it's XI-IATI-IADB (but the specific reason is actually missing from the changelog)

Still there? I think you were suggesting XI-IATI-IADB because it's in the registry but not in the org file, rather than suggesting it as a last resort

Yeah, any any form of psst text is welcome :)

@andylolz
Copy link
Member

andylolz commented Feb 28, 2018

When we look at the OECD DAC list, there's a listing, but they seem to share that with others

Aha! The “they seem to share that with others” bit sounds like a plausible explanation for why IATI might have decided to invent a new identifier.

I think you were suggesting XI-IATI-IADB because it's in the registry but not in the org file

That’s a good answer, but it’s not the right answer! I don’t look at the org IDs in the registry metadata at all (mostly because it’s more often than not wrong). Instead, this comes from this line of the algorithm:

IADB is found on this list, so the XI-IATI- prefix is added. This is maybe a biiiit of a dodgy approach (since it could easily result in false positives) – happy to remove it.

@stevieflow
Copy link

Aha! The “they seem to share that with others” bit sounds like a plausible explanation for why IATI might have decided to invent a new identifier.

Perhaps, although we don't really know (via IATI/IATI-Guidance#308 (comment))

IADB is found on this list, so the XI-IATI- prefix is added.

OK - I think that seems OK. There's no instance of "if not on any list, then suggest XI-IATI", is there?

@andylolz
Copy link
Member

There's no instance of "if not on any list, then suggest XI-IATI", is there?

No way!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants