Internal country name representation #40

koen-vg · 2021-07-22T07:24:06Z

For the whole code-base, we need to agree on a way to internally represent country names. For WP2, I have adopted ISO 3166 two-letter country codes. For OSM data extraction, country names are used (see also this note in #37). We should find a convention and stick to it.

Personally, I would argue that ISO 3166 country codes (https://www.iso.org/iso-3166-country-codes.html) are the way to go, at least for internal representation in code. In WP2, I have already had to patch the powerplantmatching tool to work with two-letter country codes, because the different databases being merged use different names for some countries. Country names depend on language, have short and long forms and sometimes contain special characters (e.g. Côte d'Ivoire) which may or may not be converted to ASCII equivalent depending on the data source. Therefore I think we are setting ourselves up for trouble if we want to use full country names in code.

Of course, when it comes to presentation, we should use full country names. There is already the dictionary at https://github.com/pypsa-meets-africa/pypsa-africa/blob/main/scripts/iso_country_codes.py, and the python package pycountry also provides easy tools for working with country names.

The alternative of using full country names internally is of course also possible, but then we need to at the very least have a strict standard for which form of the names we use. Let's discuss!

As a side-note, I think that using full names internally in PyPSA-Eur works, but even there it might have been easier to just for the country codes. I will probably raise the issue at least with powerplantmatching and see if the upstream there is interested in using two-letting country codes instead of full names (at least internally within powerplantmatching).

The text was updated successfully, but these errors were encountered:

koen-vg · 2021-07-22T07:26:04Z

We can use this issue to discuss, and maybe also keep track of what might need to be done in the various work packages to implement whatever solution we come with here.

mnm-matin · 2021-07-22T11:29:55Z

Full country names were used in early stages to assist development. Now that we are merging databases we can fully migrate to two-letter codes.

mnm-matin · 2021-07-22T11:33:43Z

relevant line 1

relevant line 2

Those two functions should be merged as mentioned in #35

koen-vg · 2021-07-22T12:35:12Z

Okay, sounds good! We should think about documenting this at some point...

If we need, there are some nice python package for working with countries. I have used this for example to get a dictrionary with all African country codes and names:

import pycountry as pc
import pycountry_convert as pcc

african_countries = []
for country in pc.countries:
    try:
        if pcc.country_alpha2_to_continent_code(country.alpha_2) == 'AF':
            african_countries.append(country)
    except:
        pass

african_countries_map = {c.alpha_2: c.name for c in african_countries}

The names we get from pycountry might also be a little more presentable than the current AFRICA_CC internal OSM names.

Also: yes I get that Senegal and Gambian might as well be one country for large-scale energy systems purposes, but could we avoid creating our own country code SNGM? I feel like this might get us in trouble in the future, and I would rather stick keep to ISO 3166 strictly.

mnm-matin · 2021-07-22T13:54:45Z

Thanks! SNGM definitely needs to be fixed. Unfortunately, we receive a single pbf file from geofabrik for Senegal and Gambian.

We could:

Split the data (based on coordinates) into Senegal and Gambian. (Would have to run an analysis as this might not be possible due to significant interconnection of the transmission network)
Use full country names as a standard (downsides mentioned above and I would add that it leads to reduced readability and significantly larger file sizes)
Treat it as an edge case with a 4-letter code and document it.
???

I would suggest temporarily removing support for Senegal and Gambian until the splitting of the data can be confirmed and implemented (in the data cleaning and not extraction). If not possible then explore the merits of the other options.

Although off-topic, this is probably a good example of the differences between europe and africa. As a result, both major and minor deviations from pypsa-eur might be necessary in the future.

pz-max · 2021-07-22T15:11:17Z

Hi guys,
I think the 2 letter code should be our convention because of the @koen-vg & @mnm-matin mentioned reasons
Providing a 2_letter_code_2_full_name function should make it afterwards pretty readable (this function could be but in the iso_country_codes script.

euronion · 2021-07-23T06:13:12Z

Strong endorsement of 2-letter country codes from my side.

Although one has to check each data source carefully if they adhere to the codes. Usually there are a handful of exceptions which are not documented (from my experience).
Some codes are also subject to dispute and may not be interpreted identically across sources, see e.g.: https://en.wikipedia.org/wiki/ISO_3166-1#Naming_and_disputes

As for merged countries:
I'd also say to avoid creating our own 2-letter country code. What you could do is name the combined region "SN-GM".
This makes it clear that:

This is a merged region (ISO-3166 CCs are without "-")
The regions are "SN" and "GM"

It can also be extended to larger regions.
In gerneral I am with @koen-vg that this should be avoided were possible. But if it cannot be easily avoided, using a combined country code might be the second best option.

davide-f · 2021-07-31T11:35:21Z

``By scouting, pycountry is also used by pypsa-europe and it may be the way to go. Pypsa-eur seems to use the 2-code standard; we may follow if it is appropriate.
By the way there are a lot of disputies over territories and I believe we should avoid problems from that point of view

This code may be what we look for in the conversion between different codes (3/2- alphas etc)

def _get_country(target, **keys):
    assert len(keys) == 1
    try:
        return getattr(pyc.countries.get(**keys), target)
    except (KeyError, AttributeError):
        return np.nan

Example of use:
3-digit from 2-digit: _get_country('alpha_3', alpha_2="ZA")
2-digit from 3-digit: _get_country('alpha_2', alpha_3="ZAF")

davide-f · 2021-07-31T11:35:51Z

Should we add it in _helpers?

koen-vg added the help wanted Extra attention is needed label Jul 22, 2021

koen-vg mentioned this issue Jul 22, 2021

Integrate Openstreetmap powerplant data with other sources using powerplantmatching #37

Closed

davide-f mentioned this issue Jul 31, 2021

Update helpers with country file name conversions #51

Closed

mnm-matin referenced this issue Aug 3, 2021

add sn-gm edge cases

ddf5127

pz-max closed this as completed Sep 10, 2021

energyLS mentioned this issue Feb 14, 2023

Error: SNGM not found in download_osm_data.py #604

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal country name representation #40

Internal country name representation #40

koen-vg commented Jul 22, 2021

koen-vg commented Jul 22, 2021

mnm-matin commented Jul 22, 2021

mnm-matin commented Jul 22, 2021 •

edited

Loading

koen-vg commented Jul 22, 2021

mnm-matin commented Jul 22, 2021

pz-max commented Jul 22, 2021

euronion commented Jul 23, 2021

davide-f commented Jul 31, 2021 •

edited

Loading

davide-f commented Jul 31, 2021

Internal country name representation #40

Internal country name representation #40

Comments

koen-vg commented Jul 22, 2021

koen-vg commented Jul 22, 2021

mnm-matin commented Jul 22, 2021

mnm-matin commented Jul 22, 2021 • edited Loading

koen-vg commented Jul 22, 2021

mnm-matin commented Jul 22, 2021

pz-max commented Jul 22, 2021

euronion commented Jul 23, 2021

davide-f commented Jul 31, 2021 • edited Loading

davide-f commented Jul 31, 2021

mnm-matin commented Jul 22, 2021 •

edited

Loading

davide-f commented Jul 31, 2021 •

edited

Loading