-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize "candidate" field across elections #9
Comments
Thanks very much for the attentive comments and the helpful code snippets and visualization. The reason that candidate names are not completely standardized across elections in our precinct data offerings is a subset of a larger fact: joining across years in general is not supported with precinct data. Precincts are very often not the same across years, but it is extremely difficult to tell when they have changed and when they have stayed the same. For further explanation, please see the paper we wrote introducing these datasets:
In contrast, our county-level and district-level datasets are designed to be consistent over time, because they describe a unit of geography that is the same over time, so we do make sure to standardize properties like candidate names across elections in those datasets. Now, I should say that it's easy to agree that we could do simple things like always write "WRITE-IN" instead of "WRITEIN" in every election year, to save users from having to remember to type two different things when they're working with two of our different election datasets. This kind of complete standardization is definitely on our to-do list, but we're a small team, and I don't have a specific date when we plan to take it on. However, insofar as the goal is to facilitate legitimate ways of joining precinct data across elections, those are an edge case -- most ways of joining precinct data across elections require much more data, like national municipal re-precincting records, than we are able to offer. |
Thanks for the quick responses on all of these. I agree joining between precincts is out of scope. I'm not trying to do that, I'm just looking for the WRITEIN vs WRITE-IN standardization (even within 2022 both of these values appear). Can you please reconsider this much smaller feature? I can even submit a PR that does this if you want. |
Oh absolutely. This is another symptom of the fact that I haven't yet had the time to tackle the cross-state standardization that I'm planning for April-May, and at that stage I absolutely should standardize WRITEIN, YES/NO, etc. I'll re-open with the scope of standardizing across states within 2022. I do agree that it would be good to standardize that kind of thing across years too, but I can't commit to a timeline on that. I can add a note about that into the readme though. |
I am combining this data with the data from 2018 and 2020:
After unioning 2018, 2020, and 2022, the encodings for certain candidates are inconsistent:
this gives:
Can we combine or make these more consistent between all these years?
Perhaps it would be useful to have a more explicit encoding for ballot measures? Like a column "kind" that is either "CANDIDATE" or "BALLOT MEASURE"? IDK, maybe overkill. If the existing column is consistent, then users can assume a row with a candidate of "YES" or "NO" means a ballot measure.
The text was updated successfully, but these errors were encountered: