-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal country name representation #40
Comments
We can use this issue to discuss, and maybe also keep track of what might need to be done in the various work packages to implement whatever solution we come with here. |
Full country names were used in early stages to assist development. Now that we are merging databases we can fully migrate to two-letter codes. |
Those two functions should be merged as mentioned in #35 |
Okay, sounds good! We should think about documenting this at some point... If we need, there are some nice python package for working with countries. I have used this for example to get a dictrionary with all African country codes and names: import pycountry as pc
import pycountry_convert as pcc
african_countries = []
for country in pc.countries:
try:
if pcc.country_alpha2_to_continent_code(country.alpha_2) == 'AF':
african_countries.append(country)
except:
pass
african_countries_map = {c.alpha_2: c.name for c in african_countries} The names we get from Also: yes I get that Senegal and Gambian might as well be one country for large-scale energy systems purposes, but could we avoid creating our own country code SNGM? I feel like this might get us in trouble in the future, and I would rather stick keep to ISO 3166 strictly. |
Thanks! SNGM definitely needs to be fixed. Unfortunately, we receive a single pbf file from geofabrik for Senegal and Gambian. We could:
I would suggest temporarily removing support for Senegal and Gambian until the splitting of the data can be confirmed and implemented (in the data cleaning and not extraction). If not possible then explore the merits of the other options. Although off-topic, this is probably a good example of the differences between europe and africa. As a result, both major and minor deviations from pypsa-eur might be necessary in the future. |
Hi guys, |
Strong endorsement of 2-letter country codes from my side. Although one has to check each data source carefully if they adhere to the codes. Usually there are a handful of exceptions which are not documented (from my experience). As for merged countries:
It can also be extended to larger regions. |
``By scouting, pycountry is also used by pypsa-europe and it may be the way to go. Pypsa-eur seems to use the 2-code standard; we may follow if it is appropriate. This code may be what we look for in the conversion between different codes (3/2- alphas etc)
Example of use: |
Should we add it in _helpers? |
For the whole code-base, we need to agree on a way to internally represent country names. For WP2, I have adopted ISO 3166 two-letter country codes. For OSM data extraction, country names are used (see also this note in #37). We should find a convention and stick to it.
Personally, I would argue that ISO 3166 country codes (https://www.iso.org/iso-3166-country-codes.html) are the way to go, at least for internal representation in code. In WP2, I have already had to patch the powerplantmatching tool to work with two-letter country codes, because the different databases being merged use different names for some countries. Country names depend on language, have short and long forms and sometimes contain special characters (e.g. Côte d'Ivoire) which may or may not be converted to ASCII equivalent depending on the data source. Therefore I think we are setting ourselves up for trouble if we want to use full country names in code.
Of course, when it comes to presentation, we should use full country names. There is already the dictionary at https://github.com/pypsa-meets-africa/pypsa-africa/blob/main/scripts/iso_country_codes.py, and the python package
pycountry
also provides easy tools for working with country names.The alternative of using full country names internally is of course also possible, but then we need to at the very least have a strict standard for which form of the names we use. Let's discuss!
As a side-note, I think that using full names internally in PyPSA-Eur works, but even there it might have been easier to just for the country codes. I will probably raise the issue at least with powerplantmatching and see if the upstream there is interested in using two-letting country codes instead of full names (at least internally within powerplantmatching).
The text was updated successfully, but these errors were encountered: