You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FIPS codes are supposed to be 5-digit strings, with leading 0s if relevant. A few errors I've found:
MA, ME, NH, RI, CT, VT, WI, MI all have 10-digit jurisdiction FIPS, eg one value is "0900305910". Hartford County CT has a FIPS code of "09003", so it looks like that is the prefix, but IDK what the suffix is coming from. IDK, this is a lot of instances of this, so perhaps this isn't an error, in which case sorry for the noise!
CA, AL, and AR all contain 4 digit county_fips and jurisdiction_fips. They were probably read in without a specfiic dtype and eg pandas interpreted it as an int
RI contains county_fips and jurisdiction_fips of "NA" and "NAN"
I am still very happy to write some testing/QA scripts for your exported .csvs that might catch some of these common errors, please let me know if that would be useful.
Thank you!
The text was updated successfully, but these errors were encountered:
You're right about the 4-digit FIPS codes, thanks for that! And clearly RI got caught with a data type error. I'll add those latter 4 states to the list to check. The 10-digit FIPS codes are the official municipality or township-level FIPS codes, which are the appropriate designator of jurisdiction in states that do not administer elections at the county level. I'll look into the Hartford County situation, if that's a county FIPS code it's probably fine but if it's a jurisdiction FIPS code it should probably have a suffix.
On the topic of QA, related to conversations we've had about this in two other Issues, we have scripts that both automatically apply padding to coerce to a the FIPS code into a 5 digit zero-padded string and that also then check for every issue you've raised and raise a flag if there is a problem. But QA on a dataset like this is a very involved process with a lot of flags. As I know you well understand since you've mentioned it in the past, consistently catching every subtle data type issue in a nearly 15 million row dataset with regular updates is a matter of more than just having a QA script. I really appreciate when you raise data issues that we can address. But this is fair warning that I won't engage with further Issues that continue to imply that we don't do QA -- we spend a very long time doing very extensive QA.
FIPS codes are supposed to be 5-digit strings, with leading 0s if relevant. A few errors I've found:
I am still very happy to write some testing/QA scripts for your exported .csvs that might catch some of these common errors, please let me know if that would be useful.
Thank you!
The text was updated successfully, but these errors were encountered: