Due to some data decisions that were made 15+ years ago, renshuu has had the issue of having multiple versions of the same term floating around in different lists. For example, 朝ごはん and 朝ご飯. (both meaning breakfast, or あさごはん). A lot of work has been done to clean this up so that only the most common version is present in renshuu's materials, but there are still a number of cases where more than one is present.
Additionally, even for cases where it has already been cleaned up, a user might have already studied more than one of them, so they have what seem like duplicates in their word lists.
However, this will require actual changes in existing user data, which is something that most changes in renshuu do not cause. Because of that, I want to run a test round with a small number of users first and make sure it works.
So, looking for volunteers! These are the steps I will take for the volunteers:
Isolate the main term among duplicates,
Maximize the mastery levels on that term, pulling from the mastery levels of the duplicates.,
Hiding (but not removing..yet) the duplicates in their account so they do not appear in studying. They WILL, however, still appear in lists (just as hidden).,
After 1-3 are confirmed, I will suppress the duplicates in all schedules, and add the primary term in all schedules that do not yet have it present.,
This will effectively remove them. If this makeshift step works, then I can roll it out to everyone. At this point, I will actually remove the duplicates from the original schedules/lists (replacing them with the primary term), and remove the duplicates from your local schedules. The reason I have to wait for removing the duplicates from your schedules is that if I do so before this step, they will automatically reacquire the duplicate terms from the original source lists.
Please let me know if you are interested! I only need users who know that they have multiple copies of some words in their lists (whether it is a single schedule, their overall user account, or anything else.). So if you are a relatively new user, this most likely does not apply to you.
My account has 1,761 words currently, although idk what dupes there are, but there are likely a few as I manually added some words before they later came up in lessons. I am already beta testing the android app and opted in to other beta tests. I can be one of the testers for this as well.
Does this mean that the option to add a different written version will be taken out to only have one version as well? I prefer to add the version that I come across while immersing, and I also prefer to add the version that includes all the kanji. So I have really liked being able to choose the version to add to my schedules. Will this still be available after all the consolidation?
I’d be happy to help with this! I think I’ve noticed some duplicate words in my schedules. I’ve definitely noticed some onomatopoeia words were added in both katana and hiragana, would those be considered duplicates as well?
I could volunteer! I have a few duplicates and some possible edge cases
- frequently I have hidden all but one of the duplicate words, so whichever you select as the primary version should only be hidden iff all of the duplicates were hidden
- たばこ and タバコ are duplicates to me, but maybe not to others
and it seems like step 4 should not add the word to lists that contained none of the duplicates?
I would like to help as well, but I'm not sure if I have any duplicates left.
For a while 固苦しい read as かたくるしい and かたぐるしい came up at the same time in quizzes and I always got the reading wrong, until I removed one of them from my schedules. I haven't really had any other terms that drove me crazy like that. But there might still be duplicates that at least don't show up both on the same day.
Does this mean that the option to add a different written version will be taken out to only have one version as well? I prefer to add the version that I come across while immersing, and I also prefer to add the version that includes all the kanji. So I have really liked being able to choose the version to add to my schedules. Will this still be available after all the consolidation?
That's a good question. At the moment, the way the conversion system is set up is that if that pair happens to be split across renshuu materials, they will be consolidated both in the materials, and in your mastery data.
However, if they are not in the renshuu materials, the the system will assume that you added them, and will not touch them.
My intention is only to fix the renshuu-maintained materials, and any presumed duplication that came from that.
It might be worthwhile to have a preliminary report sent out to each user, and then a "would you like to merge these?" - that way, the final step is in the hands of the user.
Regardless of what the user chooses, though, the renshuu materials themselves will be fixed.
固苦しい read as かたくるしい and かたぐるしい <-- those would not be considered duplicates. They are linked together in the dictionary display, but are considered separate.
The only duplicates are ones with the SAME underlying reading, but different kanji layout (and in 90% of cases, it's not different kanji, but rather, the presence or absence of a kanji, like the あさごはん example above)
Advanced search will not necessarily filter those out, so I'd like to see some examples of what you're seeing. The "duplicates" in the dictionary are correct and accurate - the issue I am addressing here is renshuu-maintained lists (which are 90% of what users study on renshuu, I'm guessing) using more than one version, which clogs up the lists and makes studying (slightly, but more than zero) less effective.
Advanced search will not necessarily filter those out, so I'd like to see some examples of what you're seeing. The "duplicates" in the dictionary are correct and accurate - the issue I am addressing here is renshuu-maintained lists (which are 90% of what users study on renshuu, I'm guessing) using more than one version, which clogs up the lists and makes studying (slightly, but more than zero) less effective.