Handle different primary keys? #2

lapidus · 2019-01-18T07:51:54Z

I need to experiment a bit more with the library but I'm not sure it has the functionality to specify primary keys / generate correct keys for a sparse dataframe?

Currently it assumes that each non-measure is a dimension for every measure:

x['concept_type'] != 'measure'

I am thinking of a scenario where you have a more sparse frame:

geo, year, gender, lex, gdp
swe, 2000,  ,   , 25444
swe, 2000,  , 88,
swe, 2000, m, 88,
nor, 1970, m, 88,

I am not sure if this would occur in the wild or if one would try to make different dataframes?

But in the above case the files expected would be something like:

ddf--datapoints--gdp--by--geo--year
ddf--datapoints--lex--by--geo--year
ddf--datapoints--lex--by--geo-gender--year

The text was updated successfully, but these errors were encountered:

miroli · 2019-01-18T12:17:38Z

~~Could you clarify what the variable lex refers to in the example?~~ Never mind, found the data.

miroli · 2019-01-18T13:34:22Z

@lapidus I think I understand the issue, but for the sake of clarity, could you very briefly specify expected vs actual output of the data in the example?

lapidus · 2019-01-18T15:08:46Z

Overall I think I need to experiment a bit more and understand when exactly we would export multiple indicators from one big dataframe vs having multiple dataframes that generate one indicator each.

But the scenario I described above would result in this actual output:

Primary key: geo-gender-year
ddf--datapoints--gdp--by--geo--gender--year
ddf--datapoints--lex--by--geo--gender--year
ddf--datapoints--lex--by--geo-gender--year

Where the preferred output is:

Primary key: varying depending on data source availability
ddf--datapoints--gdp--by--geo--year
ddf--datapoints--lex--by--geo--year
ddf--datapoints--lex--by--geo-gender--year

lapidus · 2019-01-18T15:12:18Z

Maybe let's simply put this to test with 3-4 different data sources and see if we can streamline further :)

For example these use cases — Produce a DDF from:

3-4 indicators from SCB with different dimensionality
Same with Kolada
Same with "Daniel's big democracy file" (= one long file)
Other examples ...

I'll try some things from my side, I might submit issue or pull requests :)

miroli · 2019-01-18T16:25:46Z

I think there are two issues at play here.

1. Tidy data
I think it's reasonable to let frame2package assume the input data always adheres to the tidy data format. In this case, I believe the sample data fails to meet requirement no. 3: "Each type of observational unit forms a table." as GDP by its nature describes whole populations/countries and life expectancy can refer to segments of the population. So these are probably two different tables?

2. Disaggregation levels
According to the DDF specs:

If you have different disaggregation levels, each level gets its own file. This is because the disaggregation dimensions are the (compound) primary key. With a different disaggregation, there's a different primary key and thus a different table.

which I believe is what you are referring to with the preferred output example? I will have to have a think about how to deal with this in an automated fashion.

Please let me know if I've misunderstood something. :)

miroli · 2019-01-23T21:29:22Z

I believe number above 2 and the original question in this issue was resolved with this commit. Please let me know if that is not the case.

lapidus changed the title ~~How to specify primary keys?~~ Handle different primary keys? Jan 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle different primary keys? #2

Handle different primary keys? #2

lapidus commented Jan 18, 2019

miroli commented Jan 18, 2019 •

edited

Loading

miroli commented Jan 18, 2019

lapidus commented Jan 18, 2019

lapidus commented Jan 18, 2019 •

edited by miroli

Loading

miroli commented Jan 18, 2019 •

edited

Loading

miroli commented Jan 23, 2019 •

edited

Loading

Handle different primary keys? #2

Handle different primary keys? #2

Comments

lapidus commented Jan 18, 2019

miroli commented Jan 18, 2019 • edited Loading

miroli commented Jan 18, 2019

lapidus commented Jan 18, 2019

lapidus commented Jan 18, 2019 • edited by miroli Loading

miroli commented Jan 18, 2019 • edited Loading

miroli commented Jan 23, 2019 • edited Loading

miroli commented Jan 18, 2019 •

edited

Loading

lapidus commented Jan 18, 2019 •

edited by miroli

Loading

miroli commented Jan 18, 2019 •

edited

Loading

miroli commented Jan 23, 2019 •

edited

Loading