Skip to content
This repository has been archived by the owner on Oct 16, 2024. It is now read-only.

Fix citation html for cites near page numbers. #2200

Merged
merged 1 commit into from
Feb 8, 2024

Conversation

jcushman
Copy link
Contributor

@jcushman jcushman commented Feb 7, 2024

I think this avoids the HTML errors on pages like https://cite.case.law/ark/13/409/ where citation html gets mashed up with page number html:

image

This would require running sync_case_body_cache for all volumes, which is a big intervention, so TBD if we want to do that. Possibly there's some way to filter to just cases with html display errors if we can find those.

@jcushman jcushman requested a review from a team as a code owner February 7, 2024 22:16
@jcushman jcushman requested review from tinykite and removed request for a team February 7, 2024 22:16
Copy link
Contributor

@bensteinberg bensteinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs an update to a test fixture, looks like.

@bensteinberg bensteinberg removed the request for review from tinykite February 8, 2024 13:05
@mdellabitta
Copy link

FWIW, I did some forensics on the HTML of the smaller set of reporters to find cites that linked to other released reporters with Beautiful Soup and it wasn't too bad. Perhaps we could do something similar where we look for the string "href" in the text nodes of each document?

@jcushman
Copy link
Contributor Author

jcushman commented Feb 8, 2024

Annoyingly, the tests had detected an actual typo in my PR; I fixed the broken line.

@mdellabitta good idea, it looks like there are about 1000 cases with "href" in them:

In [1]: CaseBodyCache.objects.filter(text__contains='href').count()
Out[1]: 907

I'm thinking as the next step we could push this change and try updating just those cases and spot check them to see if it worked.

Copy link

codecov bot commented Feb 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (1f201a0) 63.62% compared to head (09dbcad) 63.68%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2200      +/-   ##
===========================================
+ Coverage    63.62%   63.68%   +0.05%     
===========================================
  Files          107      107              
  Lines        11803    11820      +17     
===========================================
+ Hits          7510     7527      +17     
  Misses        4293     4293              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bensteinberg bensteinberg merged commit e08a1d0 into harvard-lil:develop Feb 8, 2024
2 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants