Figure out primary keys and update/alter #4

simonw · 2020-02-16T18:38:55Z

geojson-to-sqlite supports both --pk= and --alter options.

I'm not sure if --pk makes sense for Shapefiles - they appear to have a unijque ID property that I could use, but I'm not certain if it is guaranteed to be present. It's been present in the files I've looked at so far.

Adding tests that exercise --alter plus upserting data into existing tables would be worthwhile.

The text was updated successfully, but these errors were encountered:

simonw · 2020-02-17T01:10:04Z

I'm pretty confident that the "id" property Fiona loads from the shapefile (which starts at 0 and increments from there) can be considered unique. It appears to be the value that is used to attach shapes in the .shp file to property records in the .dbf DBase file.

I used the dbfread Python module to read the DBase file directly to see if it had a concept of an ID, and it doesn't appear to - suggesting that the index into the file is the only way to associate it with a shape:

In [1]: from dbfread import DBF                                                                                                                                          

In [2]: db = DBF("shape/scc_parkland_boundaries.dbf")                                                                                                                    

In [3]: next(iter(db))                                                                                                                                                   
Out[3]: 
OrderedDict([('PARK_NAME', 'Sanborn'),
             ('ADDRESS', '16055 Sanborn Rd.'),
             ('CITY_ZIP', 'Saratoga, CA 95070'),
             ('PRD_OPER', 'yes'),
             ('MNT_REG', 'Region 1'),
             ('RNGR_REG', 'Region 1'),
             ('SHAPE_Leng', 128308.882691),
             ('SHAPE_Area', 149564781.558),
             ('RespCostCn', '5809'),
             ('STATUS', 'open'),
             ('RuleID', 3),
             ('SHF_BEATS', 'P1'),
             ('PARK_ID', 'Sanborn'),
             ('ACRES', 3432.2485994),
             ('PARK_SUFFI', 'County Park')])

Finally, the ESRI shapefile specification at https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf says the following:

simonw · 2020-02-17T01:13:30Z

So... when importing a single shapefile into a single table it's safe for me to consider the id unique for the purpose of defining a primary key.

But how about loading multiple shapefiles into the same table? The tool supports this at the moment:

$ shapefile-to-sqlite features.db *.shp --table=features

But this won't work because of the duplicate primary keys.

This could be solved with a mode where the primary key becomes a string with a unique prefix for each incoming shapefile. Maybe it does that automatically if you attempt to load multiple shapefiles into the same table? Or maybe there's an option for that - which would be a bit more obvious, but people would have to remember to use it.

$ shapefile-to-sqlite features.db *.shp --table=features --prefix-pk

simonw · 2020-02-17T05:00:14Z

Now using .insert_all(..., replace=True) as of 22a5611

simonw · 2021-01-08T17:50:15Z

I just found a shapefile where the OBJECTID would make a better primary key:

OrderedDict([('OBJECTID', 1),
             ('CSAFP', '122'),
             ('CBSAFP', '12020'),
             ('GEOID', '12020'),
             ('NAME', 'Athens-Clarke County, GA'),
             ('NAMELSAD', 'Athens-Clarke County, GA Metro Area'),
             ('LSAD', 'M1'),
             ('MEMI', '1'),
             ('MTFCC', 'G3110'),
             ('ALAND', 2654601832),
             ('AWATER', 26140309),
             ('INTPTLAT', '+33.9439840'),
             ('INTPTLON', '-083.2138965'),
             ('Shape_Leng', 363485.4157899709),
             ('Shape_Area', 3905629939.0061007)])

https://data-usdot.opendata.arcgis.com/datasets/b0d0e777e2ad4b53803dbc0527c73d88_0

saulshanabrook · 2022-05-04T20:47:52Z

Hey @simonw, I am taking a look at this issue. I have resolved it locally using the --prefix-pk idea you mentioned, optionally prefixing the PK with the file path for each file you pass in.

Should I submit a PR for that?

simonw · 2022-05-04T20:55:57Z

Yes please! That diff looks good - just needs a test and a sentence of documentation.

simonw added the enhancement New feature or request label Feb 16, 2020

simonw added a commit that referenced this issue Feb 17, 2020

Always use id as pk, refs #4

22a5611

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out primary keys and update/alter #4

Figure out primary keys and update/alter #4

simonw commented Feb 16, 2020

simonw commented Feb 17, 2020 •

edited

Loading

simonw commented Feb 17, 2020

simonw commented Feb 17, 2020

simonw commented Jan 8, 2021

saulshanabrook commented May 4, 2022 •

edited

Loading

simonw commented May 4, 2022

Figure out primary keys and update/alter #4

Figure out primary keys and update/alter #4

Comments

simonw commented Feb 16, 2020

simonw commented Feb 17, 2020 • edited Loading

simonw commented Feb 17, 2020

simonw commented Feb 17, 2020

simonw commented Jan 8, 2021

saulshanabrook commented May 4, 2022 • edited Loading

simonw commented May 4, 2022

simonw commented Feb 17, 2020 •

edited

Loading

saulshanabrook commented May 4, 2022 •

edited

Loading