Skip to content

Conversation

eyalmazuz
Copy link
Contributor

@eyalmazuz eyalmazuz commented May 20, 2025

Closes #202

This feature extends the {frequencies} marker with the extended marker syntax and adds two new options: min-value which allows to export only the lowest value frequency out of all the options and value-only which strips the dictionary name and html list format to keep only raw numbers.

This allow memento to support card format that rely on raw frequency values to sort their cards.

Example of results:
{frequencies} or {frequencies:min-value=false,value-only=false}
image

{frequencies:min-value=true,value-only=false}
image

{frequencies:min-value=false,value-only=true} (note: values are separated by \n)
image

{frequencies:min-value=true,value-only=true}
image

Edit: I would add that this essentially serves the same end result of {frequency-harmonic-rank} and frequency-average-rank, to sort anki cards base on frequency values, but by having different values. So this might not bring any novel/unique features to memento but allow for users to have more variety on some of the values/fields

Copy link
Owner

@ripose-jp ripose-jp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change overall, but one thing you're overlooking is that not all frequency dictionaries encode numbers or integers. For example, VNFreq encodes floating points from 0 to 100. Testing this out, this isn't support by Memento, but it should be, so I'll fix that later. An older edition of this dictionary used strings with 1 to 5 star characters. I don't really care how you handle these cases, but you should make a conscious choice.

Also make sure to document your markers on the UI help page.

@eyalmazuz
Copy link
Contributor Author

eyalmazuz commented May 21, 2025

@ripose-jp
I fixed the code based on your comments.
I added documentation to the markers in the UI help page.

but one thing you're overlooking is that not all frequency dictionaries encode numbers or integers. For example, VNFreq encodes floating points from 0 to 100.

I converted the method to use double values so it could support future updates for memento and other dictionaries that don't use integers only.

regarding

An older edition of this dictionary used strings with 1 to 5 star characters.

I couldn't find the older version of this dictionary
I did find:
https://docs.google.com/document/d/1IUWkvBxhoazBSTyRbdyRVk7hfKE51yorE86DCRNQVuw/edit?tab=t.0
which has version of frequency dictionaries with star format so I copied the star symbol from there

in shoui dictionary collection the VNFreq is V2 which uses int values from 1 to 35k similar to JPDB and others
same with other collections

I would say that I created an helper function that tries to parse the string value of the frequency to double and if it fails it gives a default value of 1000.0.
this way if other dictionaries use other obscure format for their frequency values (e.g., common, rare, very rare, unique) we could just extend the parsing method to support those values and still get the same effect

but when the export happens, I use the original string value of the dictionary
so it will export "★★★★" for example instead of the literal value of 80.0 or 1000.0 (in case it fails to parse it)

This feature extends the ``{frequencies}`` marker with the extended
marker syntax and adds two new options: ``min-value`` which allows to
export only the lowest value frequency out of all the options and
``value-only`` which strips the dictionary name and html list format to
keep only raw numbers.

This allow memento to support card format that rely on raw frequency
values to sort their cards.
@ripose-jp
Copy link
Owner

I made an update to your branch. I found that returning 1000 for unusable values caused some problems. Particularly JPDB includes 1451 and 271907㋕ for 口を開けて. If min-value is set, 271907㋕ ends up getting used which doesn't seem right. I've changed it to only parse the numeric parts of the string and chuck the rest. If it can't get a numeric value, it just doesn't consider the value for the purposes of min-value. Let me know if this acceptable or if you have other ideas.

@eyalmazuz
Copy link
Contributor Author

I made an update to your branch. I found that returning 1000 for unusable values caused some problems. Particularly JPDB includes 1451 and 271907㋕ for 口を開けて. If min-value is set, 271907㋕ ends up getting used which doesn't seem right. I've changed it to only parse the numeric parts of the string and chuck the rest. If it can't get a numeric value, it just doesn't consider the value for the purposes of min-value. Let me know if this acceptable or if you have other ideas.

I'm perplexed how you spotted that edge case(s). Honestly I don't think I have anything on top of my head that could be help improve it, I'm not really knowledgeable on the wide variety of dictionaries.
so I don't have any more comments.

@ripose-jp
Copy link
Owner

I'm perplexed how you spotted that edge case(s).

Just dumb luck skipping around Spirited Away. I can't think of any other problems with this PR, so I'm merging it.

@ripose-jp ripose-jp merged commit 1159ccd into ripose-jp:master May 22, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature request] Export only lowest frequency value
2 participants