Big Source Changes, Much Improvements, Wow. #1545

ahankinson · 2024-06-18T14:20:32Z

This PR contains a whole boatload of improvements. In talking with @dchiller about this we thought we would do a monolithic PR to start with, and then see if anything needed to be changed before merging.

At its core this PR changes the way Source records are captured. With the introduction of the Institution model, the siglum and the institution name are now found in that table, and the Source identifier (the shelfmark) is replacing the former title and siglum fields on the Source model.

Along the way, a number of other changes were made.

The 'base.html' template gained two new blocks, 'title' and 'script', and all the <title> and <script> tags were moved to their appropriate blocks
A new experimental approach to Private Collectors is introduced, by marking any given institution as a private collector. This still needs to be fleshed out, though.
The shelfmark field was added to Sources
A hacky migration script was added to convert most of the Source records to the new structure. This still needs adjusting for Private collectors
The forms and templates for the source edit and create were altered, including a new autocomplete for assigning the holding institution
Optimized a bunch of queries and rewrote a couple to remove some code
Removed the siglum column from the source chant browse

The biggest change, however, is the rework of the Feast detail page. The previous code was very slow to load the page, but this is now pretty fast through the use of two raw SQL queries to fetch the necessary data for the view. I've compared it with the numbers from the existing setup, and it seems to give the same results, but the page load time has gone from 3+ seconds to ~250ms, and 1,500 queries (in one case) to a pretty constant 8 (in total for the page load). It also fixed a few problems with this page, and displays all the genres associated with a given chant, instead of just one.

How to review this PR

Normally I don't like large PRs, but it seemed appropriate that, since I was touching so many places to rework sources that I could also improve these places as I went.

I've tried to explain the changes thoroughly in every commit message. I'm also posting a few screenshots to compare the old and new changes.

Outstanding issues

Still to-do is to figure out how to deal with Private Collections. You may notice this in some places where a source will have None -- this just means there is a source, but it hasn't been migrated to the Institution / Shelfmark model, so it's not displaying the right data. This should be fixed before this PR is merged.

Also not tackled is any search interface, specifically how it deals with looking up sources by "siglum." Still TBD, particularly if we want to not break any existing searches... Feedback is useful here.

Screenshots

Changes on the left, Current Production on the right.

Feast detail. Note the genre column, and the removal of the chant with no cantus ID.

Browse Source chants, with removed siglum column

Edit Source, with Holding institution and shelfmark

adds blocks for title and scripts so that templates can put those parts in the header.

The title and scripts are no longer simply added to the body of the page, but are now actually contained by the header, where they should be.

Since the siglum and institution name now come from the institution, the source only needs to have the shelfmark. However, until all the data is properly migrated, the old title and siglum fields will remain.

This is a hacky, change-as-needed script that serves to migrate the source data from the old structure to the new, creating institutions and moving the shelfmark data to the new sources. There is a whole bunch of sigla that get skipped (essentially all private collectors) until we figure out how they will be handled. Don't look to closely, since there are lots of skeletons in this closet.

Allows sources to be added to an institution by using the autocomplete widget

Previously, the LoginView import in the URLs was taken from the views.user module, instead of directly from the django auth views, because this view was also imported into the user module. (aka, a transitive import). This changes it to import from the auth views directly. I also ran a formatter on the imports in the user module, so that's why it looks like there are more changes than there actually are!

Previously there were several duplicated property definitions that varied only by checking whether the user was logged in. They also did the sorting in Python, rather than in the database. This condenses all the definitions to a single one per property, and uses the expanding empty dict trick to determine whether `published=True` gets added to the filters. (Since without this, what is really desired is `published = True OR False` but without resorting to Q queries this was the simplest way to do it.)

This commit completely reworks the fetching of data for the feast detail page. Previously, this page took several seconds to load, and generated a large number of queries, due to the complexity of the information needing to be collated. This was rewritten to instead use two custom SQL queries that are executed directly. This means that the number of queries generated by this segment of code is now no more than two. There are a few 'interesting' things in this, which I've tried to note in comments. In the process, this also fixes a number of other small bugs: Fixes #1540 Fixes #1136 Fixes #596

This commit reworks the source display in the templates, primarily by calling a couple helper methods on the Source model to keep the referencing consistent. The 'short heading' method displays the Institution siglum and the shelfmark, while the 'heading' displays the full institution name and the shelfmark. The edit and create templates were reworked for the new structure and source referencing. The source display will now show the institution and the shelfmark in separate columns.

This is to allow saving sources in the create / edit without triggering a validation error.

Fixes #1535

annamorphism · 2024-06-18T15:21:41Z

oo, shiny!
Comments on the first screenshot (Github is not letting me embiggen the other two right now)

The "chants"/"incipit" columns seem to be flipped in the "Most frequent chants" table
I gather that the source "None None" is something to still be dealt with, but where has the Source Formerly Known as P-Pm 1151 gone? And why did the HR-Zmk and PL-WRu seemingly lose a chant each?

annamorphism · 2024-06-18T15:39:13Z

further screenshot commentary:
-removing the siglum from the browse chant table looks great! excellent change.
-Edit source page: I see it is now optional to give a source a siglum+shelfmark. If somebody leaves this blank, what happens? Could they create a whole bunch of sources all named "None None"?
-Relatedly, what happens if the editor can't find their institution in the dropdown list (is this all the institutions in the database? in RISM?) Do they have to contact somebody to add it?

ahankinson · 2024-06-18T18:49:57Z

The "chants"/"incipit" columns seem to be flipped in the "Most frequent chants" table

Good catch! I've fixed it.

I gather that the source "None None" is something to still be dealt with, but where has the Source Formerly Known as P-Pm 1151 gone? And why did the HR-Zmk and PL-WRu seemingly lose a chant each?

I'm working from a slightly older DB export at the moment, so it may not be completely up-to-date. P-Pm has the title "Porto 1511", which didn't match the "city, institution, shelfmark" pattern, so it got skipped in the conversion. It shows up as None None because of this.

Edit source page: I see it is now optional to give a source a siglum+shelfmark. If somebody leaves this blank, what happens? Could they create a whole bunch of sources all named "None None"?

It's only optional during the transition, since we can't automatically move everything and leave nothing blank. Once we've migrated everything and it "goes live" the option to leave it blank will be removed.

Relatedly, what happens if the editor can't find their institution in the dropdown list (is this all the institutions in the database? in RISM?) Do they have to contact somebody to add it?

That's kinda up to you guys. Do you want to allow people to add their own institutions, or would you rather create them on request. (I suspect the latter.) The only weird bit is Private Collectors, but I'm still working on a proposal for that.

Thanks for the review!

Buckle up, this one's a bit messy. This commit disables the Feast Detail tests. It turns out, after much searching and Googling and digging, that the Django Test Suite always wraps the tests in a transaction, so that the test can be easily rolled back after each one is run, and thus present a database in a known state for each individual test. However, with the switch to running a raw SQL query, the cursor in the Feast Detail View now runs outside of this transaction. Thus to the View, the database always looks empty, even if it's running inside of the test. I tried fixtures, TransactionTestCase, overriding methods and hacking the TestCase to prevent this behaviour, but no dice. So for now I've marked the Feast Detail tests as "skip" so that they still appear in the runs, but are skipped because they will always fail, as they're written.

It's better to use the named fields for the column content than to rely on the order of the tuple unpacking....

Still failing, but being worked on. Committing here to merge in latest changes.

# Conflicts: # django/cantusdb_project/main_app/templates/browse_chants.html # django/cantusdb_project/main_app/templates/chant_create.html # django/cantusdb_project/main_app/templates/chant_detail.html # django/cantusdb_project/main_app/templates/chant_edit.html # django/cantusdb_project/main_app/templates/melody_search.html # django/cantusdb_project/main_app/templates/source_detail.html # django/cantusdb_project/main_app/templates/source_edit.html # django/cantusdb_project/main_app/templates/user_source_list.html

Syncing the templates with the latest develop generated a number of merge conflicts. This commit fixes them. Also it fixes the new block definitions in the base templates. Fixes #1551

dchiller · 2024-07-15T15:42:43Z

The provenance and century detail pages have odd orderings now when changes to templates and views are combined. I'll make a new issue to match what we now do with notation views.

ahankinson · 2024-07-15T16:11:53Z

The preferred ordering was unclear to me. In many places it was done by siglum, so the Austrian (A-) sources would sort first, then the Belgian (B-) etc.

This is fine, but the format for the full heading doesn't actually contain the siglum, only the full institution name and the shelfmark.

I could imagine something like Cividale del Friuli, Archivio Capitolare (I-CF), LVI where the RISM siglum is in parentheses at the end of the institution name. This is what we do in RISM Online, e.g., https://rism.online/institutions/30000053, "Leipziger Stadtbibliothek - Musikbibliothek, Leipzig (D-LEm)".

Then the sorting of the list would make sense, even though the siglum isn't the first component of the title. If that is what is wanted, then "(I-CF) Cividale del Friuli, Archivio Capitolare" is probably best.

django/cantusdb_project/main_app/views/source.py

django/cantusdb_project/main_app/tests/test_views.py

django/cantusdb_project/main_app/templates/source_create.html

django/cantusdb_project/main_app/templates/source_edit.html

django/cantusdb_project/main_app/admin/institution.py

dchiller · 2024-07-15T16:45:47Z

The preferred ordering was unclear to me.

To me too, which is why I think it makes sense to separate. As it is now, it can just look totally random which probably doesn't make sense.

The second option you list (eg. (I-CF) Cividale del Friuli, Archivio Capitolare) is how it now looks on the notation detail page which I think makes sense.

annamorphism · 2024-07-15T16:49:57Z

The preferred ordering was unclear to me.

i believe this was discussed a bit here? #1227

dchiller · 2024-07-15T16:50:44Z

@ahankinson One last question, I think:

In the description above you write that there is still work to be done to figure out "private collections" and that this will lead to some sources showing up as None. Locally, I'm not actually running into any trouble with private collections at the source level...I'm running into them at the "Institution" level.

I'm getting institutions that look like this:

I'm also getting other weird institutions. For example, here's the US library of Congress:

I assume that this latter has to do with the way the names of sources were in the original data. If the intention is that we'll have to do this manual clean-up, all fine, I just want to make sure that's expected.

dchiller · 2024-07-15T16:59:18Z

i believe this was discussed a bit here? #1227

Yes, now that I look at it, the source list page suffers from this too now (ordering by something that isn't visible). I guess whatever we choose for the source list page should probably carry over to the other pages that list sources too. Title will no longer be an option so we'll have to update what we want in #1227.

ahankinson · 2024-07-15T17:39:25Z

A “title” doesn’t really make sense when talking about MSS… at least it’s not a universally recognized concept. The identifiers we can sort on are:

shelfmark (but we don’t want all the “MS 1” to sort together so that’s not useful
Institution name
City (but may not be known for private collectors)
Siglum

Personally I find sorting by country code, then city, and then institution name to be the most useful, since it groups all the mss from the same institution, and groups all institutions in one city. As long as it’s understood how they’re sorted then that’s probably the best way.

dchiller · 2024-07-16T13:03:05Z

Thanks @ahankinson for all of this!!

From my pov, we should merge this if you are good and this makes sense #1545 (comment)

ahankinson · 2024-07-16T13:41:27Z

I just started afresh and re-ran the migration. Here's what I get:

Did you run python manage.py migrate_records? The other one, migrate_institution_source_records, got so complicated that I rewrote it but kept the old version for reference.

dchiller · 2024-07-16T13:43:19Z

I just started afresh and re-ran the migration.

Interesting...I thought I did but let me try again.

dchiller · 2024-07-16T14:55:49Z

The same thing happened to me again... @lucasmarchd01 have you tried?

Update 1: I didn't realize that @ahankinson had done significant updates to the data on production to accompany this PR. Local and staging databases need to be updated with a recent production dump for this to work.

Update 2:
There are just a few institutions that probably need to be "fixed" in the data. Should we before merging?

ahankinson · 2024-07-16T15:19:49Z

I don't see any of those institutions in my data...

dchiller · 2024-07-16T15:29:58Z

I don't see any of those institutions in my data...

Looks like all are recent (past month) additions. I'm gonna merge, we'll fix them in the data later.

ahankinson · 2024-07-16T18:30:36Z

I did a fresh export from the cantus database today (the export is currently in /home/fedora on the server) and dropped / re-created the database on my local machine. So I'm not sure how they got there...

cd /home/cantusdb/code/CantusDB/postgres/
docker compose exec -T postgres pg_dump -U cantusdb cantusdb > /home/fedora/cantusdb.sql
cd /home/fedora/
tar -zcvf cantusdb.tgz cantusdb.sql

Anything wrong with that?

annamorphism · 2024-07-16T19:04:29Z

Looks like all are recent (past month) additions. I'm gonna merge, we'll fix them in the data later.

I think they are all part of a dump of private collection fragments I'm working on that's still very much in draft form! So that's me.

ahankinson · 2024-07-16T21:53:38Z

But... if they're not in the database dump I literally took from the production server today, where did they come from? Or am I getting the dump from the right place?

Edit: OK, I see them on production. They will need to be fixed post-migration.

dchiller · 2024-07-16T21:58:53Z

I did a fresh export from the cantus database today (the export is currently in /home/fedora on the server) and dropped / re-created the database on my local machine.

Looks like we have this figured out, but for reference, a new database dump is created every day in /home/cantusdb/backups/ so you don't need to do it yourself if you want the whole database.

ahankinson · 2024-07-16T21:59:54Z

OK!

This commit adjusts the columns in the browse sources table to conform to the changes in source identification introduced in DDMAL#1545. Refs: DDMAL#1569

ahankinson added 14 commits June 17, 2024 21:12

Fixed: Adjust base template

6e26435

adds blocks for title and scripts so that templates can put those parts in the header.

Fixed: Move titles and scripts to their own block

01a1dea

The title and scripts are no longer simply added to the body of the page, but are now actually contained by the header, where they should be.

New: mark institutions as private collectors

d4a58e1

New: Add shelfmark to sources

b47f9c4

Since the siglum and institution name now come from the institution, the source only needs to have the shelfmark. However, until all the data is properly migrated, the old title and siglum fields will remain.

New: Holding institution autocomplete

113e3ef

Allows sources to be added to an institution by using the autocomplete widget

New: Add a split string helper tag

ee785b3

Add migrations

73d5df4

Change source title and siglum to optional

1168e29

This is to allow saving sources in the create / edit without triggering a validation error.

Fixed: Remove siglum column

8fa654d

Fixes #1535

ahankinson requested review from dchiller, annamorphism and lucasmarchd01 June 18, 2024 14:21

ahankinson changed the title ~~Big Source Changes, Much Improvements, Wow.~~ WIP: Big Source Changes, Much Improvements, Wow. Jun 18, 2024

ahankinson added 9 commits June 19, 2024 14:48

Fixed: Small style update

c41b458

Fixed: Wrong order of columns

f507cb5

It's better to use the named fields for the column content than to rely on the order of the tuple unpacking....

Fixed: Update source display in full inventory

c5bd3f5

WIP: Updates to the tests

9f5203a

Still failing, but being worked on. Committing here to merge in latest changes.

Merge branch 'develop' into mega-changes-source-rework

85ba222

Update display of source info

0100edb

Fixed: Re-sync templates

faa90ed

Syncing the templates with the latest develop generated a number of merge conflicts. This commit fixes them. Also it fixes the new block definitions in the base templates. Fixes #1551

lucasmarchd01 self-requested a review July 15, 2024 13:13

lucasmarchd01 approved these changes Jul 15, 2024

View reviewed changes

Format imports in urls.py

b9c5d77

dchiller reviewed Jul 15, 2024

View reviewed changes

ahankinson added 4 commits July 16, 2024 09:29

Remove commented template code

2e3c429

Fixed: String representation of Sources

a174634

Fixed: Use holding institution siglum

f0c6fa2

Fixed: Unnecessary select_related items

694e376

dchiller merged commit ecf901c into develop Jul 16, 2024
1 check passed

dchiller mentioned this pull request Jul 16, 2024

Merge develop into staging; 16 July 2024 #1565

Merged

lucasmarchd01 mentioned this pull request Aug 14, 2024

Merge staging into production, 14 Aug 2024 #1601

Merged

67 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big Source Changes, Much Improvements, Wow. #1545

Big Source Changes, Much Improvements, Wow. #1545

ahankinson commented Jun 18, 2024

annamorphism commented Jun 18, 2024

annamorphism commented Jun 18, 2024

ahankinson commented Jun 18, 2024

dchiller commented Jul 15, 2024 •

edited

Loading

ahankinson commented Jul 15, 2024

dchiller commented Jul 15, 2024

annamorphism commented Jul 15, 2024

dchiller commented Jul 15, 2024 •

edited

Loading

dchiller commented Jul 15, 2024

ahankinson commented Jul 15, 2024

dchiller commented Jul 16, 2024

ahankinson commented Jul 16, 2024

dchiller commented Jul 16, 2024

dchiller commented Jul 16, 2024 •

edited

Loading

ahankinson commented Jul 16, 2024

dchiller commented Jul 16, 2024

ahankinson commented Jul 16, 2024 •

edited

Loading

annamorphism commented Jul 16, 2024

ahankinson commented Jul 16, 2024 •

edited

Loading

dchiller commented Jul 16, 2024

ahankinson commented Jul 16, 2024

Big Source Changes, Much Improvements, Wow. #1545

Big Source Changes, Much Improvements, Wow. #1545

Conversation

ahankinson commented Jun 18, 2024

How to review this PR

Outstanding issues

Screenshots

annamorphism commented Jun 18, 2024

annamorphism commented Jun 18, 2024

ahankinson commented Jun 18, 2024

dchiller commented Jul 15, 2024 • edited Loading

ahankinson commented Jul 15, 2024

dchiller commented Jul 15, 2024

annamorphism commented Jul 15, 2024

dchiller commented Jul 15, 2024 • edited Loading

dchiller commented Jul 15, 2024

ahankinson commented Jul 15, 2024

dchiller commented Jul 16, 2024

ahankinson commented Jul 16, 2024

dchiller commented Jul 16, 2024

dchiller commented Jul 16, 2024 • edited Loading

ahankinson commented Jul 16, 2024

dchiller commented Jul 16, 2024

ahankinson commented Jul 16, 2024 • edited Loading

annamorphism commented Jul 16, 2024

ahankinson commented Jul 16, 2024 • edited Loading

dchiller commented Jul 16, 2024

ahankinson commented Jul 16, 2024

dchiller commented Jul 15, 2024 •

edited

Loading

dchiller commented Jul 15, 2024 •

edited

Loading

dchiller commented Jul 16, 2024 •

edited

Loading

ahankinson commented Jul 16, 2024 •

edited

Loading

ahankinson commented Jul 16, 2024 •

edited

Loading