Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recall difficulty? #19

Closed
gcoda opened this issue Dec 29, 2019 · 6 comments
Closed

Recall difficulty? #19

gcoda opened this issue Dec 29, 2019 · 6 comments

Comments

@gcoda
Copy link

gcoda commented Dec 29, 2019

I am not sure if this is even a real problem.

I read in few places - more energy spend on trying to recall a thing - more chances to remember it next time. But most real life apps penalize you for taking too long to recall.

How to make a updateRecall with result as float?

As is - it works very well, maybe updating a model based on time card was displayed is mistake.

@fasiha
Copy link
Owner

fasiha commented Dec 30, 2019

Hey, thank you for writing!

Yes, one of the surprising findings of the psychology of learning and recall is that, the more you struggle you remember a thing, the longer you then remember it. That's why people recommend you don't over-review: because you're not strengthening your memory if you don't work hard to recall.

And that's a great observation: apps like Memrise (their website) will fail your review if you take too long to answer, and you're right, it's possible that is hurting you!


The problem is, there's no easy way to make updateRecall work with a float instead of boolean 😕. The underlying statistical model assumes that quizzes are binary (i.e., Bernoulli trials, which is why the memory model is a Beta distribution). It's possible that we try to generalize this, so that quizzes are categorical trials, and make the memory model Dirichlet distributions, but that will take a lot of mathematical analysis and might not work in the end.

If we make quizzes categorical trials, that means instead of a quiz being pass/fail, it gets a rating, from a pre-defined set of categories, e.g., how Anki has fail/hard/good/easy. The downside is, a categorical trial assumes there is no relationship between categories, which would be very unsatisfying, since we know there's a very specific relationship between "hard" versus "good" versus "easy"…


If you are writing a quiz app, I strongly recommend you store display time for each card in your database, because someone might find an elegant and robust way of incorporating that into the update step. Or we might do some machine learning to see if time-spent-on-quiz can predict the strength of the memory.

This is definitely a good idea, something that I'll be thinking about!

@gcoda
Copy link
Author

gcoda commented Dec 30, 2019

Thanks for really detailed explanation.

I suspected right way going to be difficult.
But my simple mind was thinking of updating a model with mean of updateRecall(true) and updateRecall(false).

Since your suggestion of storing display time for later, i will think about this later, with some logs and hopes for linear approximation

@fasiha
Copy link
Owner

fasiha commented Feb 3, 2020

There’s one way that Ebisu could handle something other than binary quizzes. Have you tried Memrise or Duolingo? Both of those apps have this concept of recall session where the same flashcard may be reviewed multiple times, not just once. Therefore, instead of a flashcard being pass/fail, you get a percentage, “this flashcard was shown N times during this quiz session and the student knew it K ≤ N times”.

Ebisu inherited the idea of “N always equals one” from Anki but it should be straightforward to extend the mathematics to handle this case. The quiz goes from being a Bernoulli random variable to a binomial random variable.

This isn’t quite what you had in mind—it’s not the same as “difficulty of a single test”, and it doesn’t capture time-to-answer, but I think it’ll make the library more useful fo more types of quiz apps.

I’ll update this thread when I get to this.

@fasiha
Copy link
Owner

fasiha commented Feb 3, 2020

Of course you could abuse the binomial quiz feature by claiming that a single easy review was actually “99 successes out of 100 time seen” (that is, one quiz session consisted of seeing this card a hundred times and you got it right 99 of them), or 75/100 for a normal recall difficulty, or 50/100 for a hard difficulty 😝!

Which is what what you asked for, an updateRecall that accepts float (between 0 and 1), not just Boolean.

(I’m still not sure how to best take into account time-to-answer. I definitely think it is a good idea, but it will be hard for quiz apps to accurately measure time-to-answer (what if a user was just distracted while doing their quiz?), and so data will be noisy. Maybe we can assume that, if the user responded very quickly, the memory is very strong (and maybe the quiz was too soon) but if the user takes a long time to respond, then that doesn’t mean anything (they may have gotten distracted or they may have actually been trying to recall the whole time). If we do that, then we’d need to decide how long is “long”… something to continue thinking about!)

@fasiha
Copy link
Owner

fasiha commented Mar 9, 2020

Tentatively closed by version 2.0.0: https://github.com/fasiha/ebisu/blob/gh-pages/CHANGELOG.md the new binomial quiz feature (where updateRecall takes two integers, successes and total) can be used to hack this but I don't yet know how reliable it is, or if there are any pitfalls or caveats.

Feel free to reopen, anyone!

@fasiha fasiha closed this as completed Mar 9, 2020
@fasiha
Copy link
Owner

fasiha commented Jun 12, 2020

@gcoda I discuss making the result a float at #23 (comment) and I'd welcome your thoughts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants