Skip to content

Conversation

maxachis
Copy link
Contributor

@maxachis maxachis commented Oct 12, 2025

maxachis and others added 26 commits September 12, 2025 12:40
…_sc_agencies_search_location

mc_869_sc_agencies_search_location
…_sc_agencies_search_location

Finish deprecating endpoints
…423_remove_agency_location_endpoints

Deprecate logic
…_source_collector_meta_urls_post

mc_add_source_collector_meta_urls_post
…_source_collector_meta_urls_post

Add agency ID as return value in `/source-collector/meta-urls` `POST`
…_retire_internet_archives_endpoints

Retire Internet Archives endpoints
@josh-chamberlain
Copy link
Contributor

@maxachis Almost ready to approve, just got some pedantry:

@maxachis
Copy link
Contributor Author

@josh-chamberlain How does it look now? Any better?

@josh-chamberlain
Copy link
Contributor

josh-chamberlain commented Oct 16, 2025

@maxachis better! The template descriptions tend to irk me.

  • I can't seem to access the dev API docs, which is a quick way to look at the endpoints—are those up somewhere that I just can't find them? Nothing seems amiss while inspecting the code but I'd like to look there.

  • During my testing, I found I am getting a "Something went wrong" error on /data-source/create. Are you seeing the same thing / anything in the logs? Ideally we'd set up the new endpoints in source-manager first, and wire everything up to those (pdap.io, retool) first, then deprecate from data sources app after that. You said "except for those affecting users." so maybe we are on the same page, but the way my brain works I can't really look closely without a nice rendered API docs screen 😬

  • I'm also seeing reduced counts of sources across the board, at least on the map. How sure are we that this is not due to different behavior of the endpoints vs data being out of sync between dev and main?

location .io .dev
CA 457 446
PA 285 250
OR 28 27
MO 59 54
Allegheny County 240 216
Philadelphia County 19 18

On actual search results (for example, Allegheny County), I seem to get the same counts between environments:

https://pdap.io/search/results?location_id=923#county
https://pdap.dev/search/results?location_id=923#county

local (181)
county (37)
state (5)
federal (42)

These numbers do not add up either 240 or 216 in any configuration, so I am extra confused. Getting these counts right is something we can't seem to do consistently so I'd at least like to make sure we are not introducing new bugs or counting things differently.

@maxachis
Copy link
Contributor Author

maxachis commented Oct 19, 2025

  • I can't seem to access the dev API docs, which is a quick way to look at the endpoints—are those up somewhere that I just can't find them? Nothing seems amiss while inspecting the code but I'd like to look there.

You can find them in two new locations!
https://data-sources.pdap.dev/api/v2/
https://data-sources.pdap.dev/api/v3/docs

The different addresses are partly the product of the fact that FastAPI and Flask are technically two different servers. There's also some typos in nomenclature (a v3 endpoint says "v2" and the instructions in v2 for accessing v3 are wrong), so I'll need to correct that.

  • During my testing, I found I am getting a "Something went wrong" error on /data-source/create. Are you seeing the same thing / anything in the logs? Ideally we'd set up the new endpoints in source-manager first, and wire everything up to those (pdap.io, retool) first, then deprecate from data sources app after that. You said "except for those affecting users." so maybe we are on the same page, but the way my brain works I can't really look closely without a nice rendered API docs screen 😬

That one's an error on my part, and a product of me being overzealous in destruction and focusing on the primary user path. I can focus on taking care of setting up the endpoints for that.

  • I'm also seeing reduced counts of sources across the board, at least on the map. How sure are we that this is not due to different behavior of the endpoints vs data being out of sync between dev and main?

location .io .dev
CA 457 446
PA 285 250
OR 28 27
MO 59 54
Allegheny County 240 216
Philadelphia County 19 18
On actual search results (for example, Allegheny County), I seem to get the same counts between environments:

https://pdap.io/search/results?location_id=923#county https://pdap.dev/search/results?location_id=923#county

local (181) county (37) state (5) federal (42)

These numbers do not add up either 240 or 216 in any configuration, so I am extra confused. Getting these counts right is something we can't seem to do consistently so I'd at least like to make sure we are not introducing new bugs or counting things differently.

First thing I will note is that, when running the get_data_sources_for_map query here, and adding where where l.id = 6593 to the bottom (that's Pittsburgh's location ID), I get 110 rows in both prod and dev. That matches what we have in pdap.dev for Pittsburgh, but not in pdap.io. So I believe a counting bug has been resolved in the new version, at least for map.

I do believe there is a bug in search/map counting, but one that has been present for a hot minute. I think that's an issue worth addressing, and addressing prior to pushing out this PR. I shall look into it 🧐.

Finally, I have a request for you! The GitHub sync endpoint will need update from where it is now! We'll need to change /api to /api/v2 in the endpoint it references!

@maxachis
Copy link
Contributor Author

@josh-chamberlain I have figured out and resolved (one) issue with the map/location discrepancy, but the larger issue are promiscuous attributes (such as jurisdiction type) which are used inconsistently, and which also overlap with other forms of categorization. That requires a larger focus on adding checks for data integrity, which I describe in Police-Data-Accessibility-Project/data-source-manager#502.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make source manager the "source of truth" for the database

2 participants