掲示板 Forums - Discussion: future studying trends implementation
Top > renshuu.org > Feature Requests/Improvements
This is a request that has come up repeatedly over the years. While I am not in a position to work on it now, I think a good discussion could help lay the framework of what needs to be done so I can hopefully make it a reality in the future!
Goal: provide the user with a trending expected number of reviews in the near future (1-30 days) based on recent study habits. This would be both schedule-level and account-level.
Benefits: the main benefit that I see is that it will allow users to better adjust their study amounts (especially with regards to new terms being added) so they do not get overwhelmed. It could *even* be used potentially to automatically adjust new term amounts to prevent reviews from going over a certain amount, aiming for a specific average reviews per day, etc.
This is one of those things that is relatively easy to do a "decent" job of, but surprisingly hard to get the numbers good enough that it will be of use to people, and not a "well, it's there, but they are never right, so I won't use them)
Below are all the levels of data that would need to be considered that I can think of, and potential difficulties. I am relatively good at math, but not math math, if that makes sense, so coming up with equations to handle some of the extrapolations of data would be helpful. Please comment on anything you think is wrong, missing, etc.
Level 1: Review prediction of pre-existing terms.
For any given term, renshuu keeps track of how many times a term is gotten correct or missed for each study vector. So for example, word X might have the number of correct and missed for kanji > kana, kana > meaning, meaning > kana, etc. Of course, it also holds onto the next ideal study date for each of the vectors.
So, assuming a user gets questions correct all the time, it is very easy to say that for vector A of term X, the next dates to study will be (for example) today, in 5 days, in 14 days, in 20 days, etc.
**Challenge** - how can accuracy be put into this system? On a rough level, let's say they get the term wrong 33% of the time. So each of those times, based their settings, the level will drop, and it might instead look like this: in 5 days, 14 days, 2 days, 5 days, 14 days, 2 days, 14 days, etc...
Additionally, renshuu does adjust other vectors for a term when you get a question right. For example, Term X has vectors A, B, and C. A is due for review today, B is tomorrow. (last time B was reviewed is 20 days ago). If you get A correct, it's going to push B back a few days. The reason for this is that term X is now in somewhat recent memory, and if you study B the following day, it's not quite the same as "this term was studied 21 days ago, so the review will accurate gauge if they can remember it after 21 days). Since each vector is not studied in a vacuum, renshuu tries to at least moderately adjust for this.
On top of that, the spacing of reviews, for most users, adjust themselves depending how accurate you've been in the past. For example, the review spacings for a term you've never gotten wrong are significantly larger than one you've had a lot of trouble leveling up.
Given these challenges, it feels already like a significantly challenging problem to map out the potential reviews of one term, let alone 100 or 1000.
(Note: I am not even sure if it'll be computationally feasible to estimate this on a per-term level, or if a higher level of aggregation is needed. For users with 10,000+ terms, doing "predictions" on each term, average of 3-4 vectors per term, is a ton of things that will need to be done.)
Level 2: Prediction of terms not yet learned.
If level 1 can be handled, then I do not think it will be *too* hard to handle level 2. If we average out how often they inject new terms, we can then treat those terms as level 1 from a certain point in time, and map out the reviews from there.
**Challenge** - schedules are not infinite, so this would need to make sure it stops considering new terms once the schedule is exhausted.
**Challenge** - for users that sometimes or frequently do the "I already know this" (analytics on this is not saved, so I would have to start saving it), then you can no longer assume that they are all level 1 (0%). For some users on some sets of materials, you might have 30% or more being marked this way!)
**Challenge** - Word/Kanji helper schedules grow in size over time, so harder to estimate how large the schedule is.
Level 3: Word/Kanji interaction
Most users adhere to the settings that have kanji becoming available for words after the kanji is studied in renshuu. A term going from "no kanji known" to "kanji known" (in a way) resets the apparent mastery level of the term, and so the review schedule is thrown into chaos a bit.
Others:
- multiple schedules with overlapping materials - not too bad for level 1, but tricky for managing new term counts.
- type of new term restrictions (per schedule, globally, per day, per week, etc.)
These are just the things off the top of my head, but I can see enough complications that if a really good statistical system is not built, then the estimates would be so rough that it would not be worth all the time it takes to develop this.
Thoughts on the merits of this system aside, I'd love more math/stats oriented people to weigh in on what they think about this.
Top > renshuu.org > Feature Requests/Improvements