-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor JournalAbbreviationRepository class to use on-disk maps #12606
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output.
You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.
} | ||
|
||
public void addCustomAbbreviation(Abbreviation abbreviation) { | ||
Objects.requireNonNull(abbreviation); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this comment removed? Please readd it provides helpful insights
private final String abbreviation; | ||
private transient String dotlessAbbreviation; | ||
private final String dotlessAbbreviation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these no longer marked as transient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous solution returned null for the name
and dotlessAbbreviation
fields because they were marked as transient. As a result, these values were lost during deserialization from the h2 database, and new Abbreviation objects were created using only the key and abbreviation. I now need the deserialization process to correctly restore these fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit tests failed, despite the test passes on my local, I think the transient fields may be the problem, I will try with them again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no clue, why I introduced them - I think with the new system, there should be no need for it.
private final TreeSet<Abbreviation> customAbbreviations = new TreeSet<>(); | ||
|
||
/** | ||
* Initializes the internal data based on the abbreviations found in the given MV file | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readd the javadoc
} | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wha did you remove all the javadoc comments?
@@ -14,111 +12,72 @@ | |||
import org.h2.mvstore.MVMap; | |||
import org.h2.mvstore.MVStore; | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not remove the javadoc comments!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry for the oversight. While testing a solution—where I created the MV maps in JournalAbbreviationRepository—I added some debugging comments. Unfortunately, I accidentally removed them during the cleanup. I will roll back those changes.
} | ||
|
||
private void openMaps(MVStore store) { | ||
String fullToAbbreviationMapName = "FullToAbbreviation"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extract all the strings as constanst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean to make these as final fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private static final
public Optional<Abbreviation> get(String input) { | ||
// Clean up input: trim and unescape ampersand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output.
You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output.
You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output.
You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.
I'm getting an out-of-memory error on the unit tests even though they passed locally. The error also mentioned that some tests failed, but when I tried them locally, they passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output.
You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR refactors the JournalAbbreviationRepository to use on-disk MVMap structures instead of in-memory HashMaps, reducing memory overhead and execution time. It also updates JournalListMvGenerator accordingly and adjusts the Abbreviation class to mark some fields as transient final.
- Refactored repository constructors and map usage in JournalAbbreviationRepository.
- Updated JournalListMvGenerator to populate MVMaps with cache configuration.
- Modified the Abbreviation class to declare transient fields as final.
Reviewed Changes
File | Description |
---|---|
src/main/java/org/jabref/logic/journals/JournalAbbreviationRepository.java | Switched from in-memory maps to MVMap and updated retrieval logic. |
src/main/java/org/jabref/cli/JournalListMvGenerator.java | Updated map initialization and added caching for MVStore. |
src/main/java/org/jabref/logic/journals/Abbreviation.java | Declared transient fields as final to support immutability. |
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
src/main/java/org/jabref/logic/journals/JournalAbbreviationRepository.java:24
- [nitpick] The constant name 'ABBREVIATION_TO_ABBREVIATION_MAP_NAME' does not clearly align with the corresponding variable 'abbreviationToNameMap'. Consider renaming the constant to 'ABBREVIATION_TO_NAME_MAP_NAME' for improved clarity.
private static final String ABBREVIATION_TO_ABBREVIATION_MAP_NAME = "AbbreviationToName";
src/main/java/org/jabref/logic/journals/Abbreviation.java:10
- Marking the 'name' field as both transient and final may lead to issues during deserialization since final transient fields are not reinitialized. Consider removing the final modifier or implementing a proper deserialization mechanism (e.g., a readResolve method) if deserialization is required.
private transient final String name;
private transient final String name; | ||
private final String abbreviation; | ||
private transient String dotlessAbbreviation; | ||
private transient final String dotlessAbbreviation; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, declaring 'dotlessAbbreviation' as transient final can cause deserialization problems since it won’t be restored automatically. Consider removing the final modifier or handling its restoration during deserialization.
Copilot is powered by AI, so mistakes are possible. Review output carefully before use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this fully works.
Can the Abbreviation
object be removed completely removed from the MV Store - and only used when queried at for the AbbreviationViewModel
?
@@ -45,8 +45,13 @@ public static void main(String[] args) throws IOException { | |||
MVStore store = new MVStore.Builder(). | |||
fileName(journalListMvFile.toString()). | |||
compressHigh(). | |||
cacheSize(128). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? Add a Java comment. If unsure, remove this. - The generator is run only once and the map closed upon run completion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was taking around 1.1GB of memory, I thought the out-of-memory error was from here, so I tried to reduce the cache size of the MVStore. So I will remove it.
src/main/java/org/jabref/logic/journals/JournalAbbreviationRepository.java
Show resolved
Hide resolved
} | ||
|
||
/** | ||
* Initializes the repository with demonstration data. Used if no abbreviation file is found. | ||
*/ | ||
public JournalAbbreviationRepository() { | ||
// this will persist in memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found in the H2 database docs that when we don't provide a path to the store, the whole maps of the store will persist in the memory. So I wanted to mention this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a link to the documentation.
Even in 2025, the IDEs do not inline comments in PRs in code comments. Thus, we need to do this for ourselves.
return fullToAbbreviationObject.values(); | ||
List<Abbreviation> values = new ArrayList<>(); | ||
fullToAbbreviationMap.forEach((name, abbr) -> { | ||
String abbreviationString = abbr.getAbbreviation(); | ||
String shortestUniqueAbbreviation = abbr.getShortestUniqueAbbreviation(); | ||
Abbreviation newAbbreviation = new Abbreviation( | ||
name, | ||
abbreviationString, | ||
shortestUniqueAbbreviation | ||
); | ||
values.add(newAbbreviation); | ||
}); | ||
return values; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This map is deserialized from the store, and since the name
and dotlessAbbreviation
fields are transient, they will be null. So I needed to transform the values so they would be set properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will close this PR and I will try to think of another good solution trying to learn from my faults in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - no need to close this PR, as everyone has more context via the comments now. You can keep working on this one itself.
private static final String SHORTEST_UNIQUE_TO_ABBREVIATION_MAP_NAME = "ShortestUniqueToName"; | ||
|
||
private final MVStore store; | ||
private MVMap<String, Abbreviation> fullToAbbreviationMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this map still needed? Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Since the name
and dotlessAbreviation
fields are transient, I use this to create the abbreviation from its name (which is the key), the abbreviation string, and the shortUniqueAbbrevition
fields which can be contained from the value of this map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the abbreviation created? I think, it is only required for the UI when abbreviations are changed. When using the logic module for abbreviations, there should be no need for this object?!
In call cases, when updating an abbreviation in the UI the correct behavior needs to be (manually) tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the get
function, I need to create the abbreviation again for the same reason that the abbreviation contains transient fields. If you can confirm that do we need the fields of the abbreviation to be transient or not this may help a lot. From what I understand, I see we don't need them to be transient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please investigate the usage for yourself. I have no clue about the transient - i think, this property needs to be removed if they are kept be stored in the mv store, which I doubt!
This PR will maybe have conflicts with #12544 |
PR #12652 modified this class, too. @marwanemad07 Can you please resolve the conflicts. Then, this PR should be ready for review (isn't it). Then, we should get this in - and then #12544 should be adapted accordingly. |
@trag-bot didn't find any issues in the code! ✅✨ |
JUnit tests are failing. In the area "Some checks were not successful", locate "Tests / Unit tests (pull_request)" and click on "Details". This brings you to the test output. You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide. |
OMG 😅 |
@@ -47,6 +47,10 @@ public static void main(String[] args) throws IOException { | |||
compressHigh(). | |||
open()) { | |||
MVMap<String, Abbreviation> fullToAbbreviation = store.openMap("FullToAbbreviation"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does one really need this - I think, this can be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need them to create the mv file to be reused in the JorunalAbbreviationRepository
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if the whole abbreviation functions can be written without using the full object?!
I didn't feel disappointed until I saw it 😂. |
I have no "quick" clue how to proceed here. Think, I can dive into it this Friday the earliest; probably next week. |
Here's some more good news - this happens only on CI and not on his local. |
The bad news is when I profiled the JournalAbbreviationRepositoryTest.java I found that the function Followup: A question for anyone who may work on this issue if he follows the same solution or the current solution. when we profile this it takes around 500MB to 1GB although the file itself is around 4MB. |
Fuzzy matching was introduced at #12652 |
Is this now the time to move to Apache Lucene or to Postgres for journal abbreviations? The best thing would be to use postgres and Soundex (https://www.postgresql.org/docs/current/fuzzystrmatch.html) (first google hit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR should rewrite
- org.jabref.gui.journals.UndoableAbbreviator --> to use methods of the journalAbbreviationRepository not returning
Abbreviation
object, but strings - org.jabref.logic.journals.JournalAbbreviationRepository#getShortestUniqueAbbreviation should not use
get
, but use the maps - org.jabref.logic.journals.JournalAbbreviationRepository#get should be removed
- org.jabref.gui.journals.UndoableUnabbreviator --> similar rewrite as org.jabref.gui.journals.UndoableAbbreviator
- Rewrite org.jabref.gui.preferences.journals.JournalAbbreviationsTabViewModel to use the String methods - and create the
Abbreviation
object for itself - Inline org.jabref.logic.journals.Abbreviation#getNext to org.jabref.logic.journals.JournalAbbreviationRepository#getNextAbbreviation with the help of the Maps.
- org.jabref.gui.preferences.journals.AbbreviationViewModel can stay IMHO
- org.jabref.logic.journals.AbbreviationWriter can also stay - thus
org.jabref.logic.journals.Abbreviation
is also staying inlogic
.
(I was about to say that org.jabref.logic.journals.Abbreviation should be moved to GUI as AbbreviationViewModel
, but then I did not know how to code the writer properly)
ALTERNATIVE: Move org.jabref.logic.journals.Abbreviation to GUI and introduce org.jabref.logic.journals.Abbreviation
as record
without any logic in it.
I will look at the PR again and rewrite it. I want to mention something I noticed, the current solution and my solution take too much memory because of iterating over the MVMap (any part iterating over the MVMap takes too much memory). I tried to read H2 database docs but they are not clear on how they handle it. I will read how they do the deserialization for MVMap. |
@marwanemad07 No need to dive into the MVStore implementation. Just rewrite the class interface to be more "dumb", then, the class can be rewritten to use another backend. Iterating on everything should be necessary only if all abbreviations should be displayed. For abbreviate/unabbreviate it should not be necessary. Oh, there is fuzzy matching. - With this requirement, another backend should be used. -- Thus, proposal:
|
Closes #12571
This PR refactors the
JournalAbbreviationRepository
to useMVMap
for all abbreviation maps and creates these maps inJournalListMvGenerator
to use these maps on demand. This change reduces memory overhead and improves consistency by eliminating redundant in-memory structures.I found that the total execution time decreased to 42ms with memory usage at 2.08MB, compared to the previous 458ms and 146.95MB on my device using IntelliJ Idea's profiler.
Mandatory checks
CHANGELOG.md
described in a way that is understandable for the average user (if change is visible to the user)