Vocabulary dictionary

Kanji dictionary

Grammar dictionary

Sentence lookup

test
 

Forums - Fall 2024 Focus: Text Analyzer Improvements

Top > renshuu.org > Feature Requests/Improvements

Page: 4 of 4



avatar
マイコー
Level: 281

I've started adjusting the url (mainly for browser users) so that you have a url to jump into if you want to return to a certain page.

Also, back button does work to get you out of the text and back to the main list of texts, but I'm having trouble (without having to change a ton of other stuff) allowing the back button to just go back one page (if you are looking at a multi-page text).

1
1 month ago
Report Content
avatar
Gibolt
Level: 519

If you include page number and update the url param correctly, back should work as expected to go to previous page (if navigated from). It would be unexpected to go to page 3 from 4, if I'd opened 4 directly or from 1.

I've started adjusting the url (mainly for browser users) so that you have a url to jump into if you want to return to a certain page.

Also, back button does work to get you out of the text and back to the main list of texts, but I'm having trouble (without having to change a ton of other stuff) allowing the back button to just go back one page (if you are looking at a multi-page text).


For the suggested page tracking (if you end up working on it), could just be a return to (ex) Page 4 under/next to the doc title on the main list

0
1 month ago
Report Content
avatar
Gibolt
Level: 519

Tried using the filters, and noticed that the Kanji in particular don't highlight which uses you know or their school level. That would be OK, but there also isn't a quick way to view it in the dictionary. Clicking the character does not open it like in most places. Copying + pasting is tedious on mobile.

Just adding the dictionary icon to the icon list should be enough (which would benefit the Vocab items as well, to see more details), and/or enabling clicking the character.

0
1 month ago
Report Content
avatar
マイコー
Level: 281

If you include page number and update the url param correctly, back should work as expected to go to previous page (if navigated from). It would be unexpected to go to page 3 from 4, if I'd opened 4 directly or from 1.

I've started adjusting the url (mainly for browser users) so that you have a url to jump into if you want to return to a certain page.

Also, back button does work to get you out of the text and back to the main list of texts, but I'm having trouble (without having to change a ton of other stuff) allowing the back button to just go back one page (if you are looking at a multi-page text).


For the suggested page tracking (if you end up working on it), could just be a return to (ex) Page 4 under/next to the doc title on the main list

It's not that easy, unfortunately. renshuu has a home-grown SPA (single-page-app) implementation which works, but is not what you'd call *thoroughly robust". So moving back is not just loading the pages (which could be done, if the SPA was not in place, but a ton of JS manipulating the browser history and cache for various speed benefits.

So while I'm pretty confident that it can be done, I'm not quite sure how to handle it.

I'll add some technical notes at the bottom.

As to the missing lightbulb - I'll consider adding that later, but not at the moment. That term list is the same as the term list everywhere else, so I'd need to make special options there to handle the extra icons.



Technical stuff.

So, for most devices, renshuu holds 3 or 4 of the last pages/modals (modals = term lists, grammar pages, and now, text analyzer reader) in a small cache. When a new page is loaded, the url is rewritten by JS, and then the page is loaded via ajax and put in place. Press back, and it rolls the window's history back one, which triggers the cache system to not actually reload the previous page, but to toss the cached page back up (unless the cache runs out).

Here's the problem: consider a 5 page reader.

When you open it, the page's url will be /text/text_id/pg_number, or as an example, /text/1000/0

If you go the next page, it is /text/1000/1, /text/1000/2, and so on.

If you were to use the back button, this would normally work fine - it would grab the last cached copy, which is the previous page. Easy!

However, let's say the cache has three pages on it:

/text/1000/0

/text/1000/1

/text/1000/2

If you then press the x in the top right, you'd want to clear out those 3 entries, and return to whatever was before it (the main TA page).

I originally tried something like this: (pseudo-code)

while( last cache entry is a reader page ) {

//add code to prevent the page from actually showing the cache for all the intermediary pages

window.location.back();

}

You may think this would work, as it would clean off all the cached pages, and clear the urls out of the browser's history (which you cannot do manually - you can either replace the current one, or add onto it)

However, window.location.back() is asynchronous! AHHHH. So that loop would run 1000 times before the 1st one finishes, and the browser would (and did) crash.

The main issue with all of this is that you cannot easily manipulate the browser history when going backwards - you must use the window.location.back().

Tricky.



1
1 month ago
Report Content
avatar
マイコー
Level: 281

**[Improvement]** Text Analyzer has new reader settings for always or never showing furigana.


Also, I am looking into how to implement a way to show word frequency. I have all that data already (words, kanji, and grammar), however, I am not quite sure how we should add it into the tools.


I could just add another category here which is something like "Most common terms in text" (but something shorter), then perhaps a selector that lets you say how frequent you want them to be.

This is where it could get real messy. Since texts are of different lengths, we don't really want "more than xx times in the text".

Maybe "The most xx% common words in the text", with xx being 10,20,30,40,50, etc?

When this is selected (and perhaps, even without), I could have a bit of extra text in the term list that shows # of occurrences.

Thoughts?

Ninja edit: added small text for # of times in text.

3
1 month ago
Report Content
avatar
キップ
Level: 210

I'd like to see the "always use kana when the original text does" option expanded to have a second, not mutually exclusive option: "keep usage of rare kanji intact where the original text used them". (Presumably it wouldn't do this if my settings had rare kanji spellings intact, but I got tired of learning them for words where it was only used some 0.00001% of the time when I enabled it to see and learn cases where it's used 5~20% of the time. (I'm excited to see the per-word toggle for that option! since you did mention you were working on that))

Thank you for everything, as always!!

0
1 month ago
Report Content
avatar
gillianfaith
Level: 1044

The software I currently use to generate frequency lists, cb's Japanese Text Analysis Tool, displays the following vectors for frequency:

  • Number of occurrences
  • Overall frequency ranking
  • Term's use as a percentage of the whole text
  • Cumulative usage percentage of the term + every term ranked before it

I use all of that information at some point or another when making vocab lists, but it's a bit excessive to present as anything other than a spreadsheet. The cumulative percentage is probably the most useful and what I'd be most excited to see from the Text Analyzer, because it deals directly with the question of "How many words do I need to learn to understand 50% of this text? 80%?", etc., and I think being able to generate a list of terms in the Text Analyzer based on a percentage comprehension goal would add a lot of value.

I don't see much of a problem with a simple "more than xx times in the text" or "top xx most frequent words" type of list, if that number was just left blank by default and entered by the user, but I agree that going by percentages would be a cleaner solution to account for the wide range of possible text lengths. For me, the benefit of using a straight number instead of a percentage is mostly just to isolate words that are only used once, so regardless I'm a fan of showing the # of occurrences in the term list.


2
1 month ago
Report Content
avatar
Gibolt
Level: 519

An alternative would be a pair of range inputs or dual sliders, with min/max and a checkbox for property (occurrences/ percentage/rank)

1
1 month ago
Report Content
avatar
マイコー
Level: 281

Making those numbers would be trivially easy, but not sure if it'd be better as a display, or a download.

0
1 month ago
Report Content
avatar
Gibolt
Level: 519

I tend to prefer custom filtering inline, especially since a download would already make filtering easyish in a spreadsheet tool, if you plan to do something in bulk offline with it.

Although, very small pagination is a regular limitation I encounter on the site. To me, if anything that is easy to hit the pagination limit had an <all> option or similar in the page dropdown, I'd be very happy.

0
1 month ago
Report Content
avatar

Maybe this feature already exists and I’m just not seeing it, but it sure would be nice to have support for ruby characters.

For example, if I paste this text into the text analyzer

the ruby text in the last line becomes あやしい, so I have to manually go and delete either or あや. I haven’t really experimented with square brackets and double slashes, but even if they work that’s more fussing around that I’d rather have automated.

Also, since the majority of the un-copyright-encumbered text on the web is prewar vintage, it would be great to have basic support for modernizing old-style writing, things like だつた→だった, してゐた→していた, いふ→いう, etc.

0
1 month ago
Report Content
avatar
VoidWinter
Level: 49

My 2 cents. For frequency I'm used to just an integer "occurs xx or more times", but as long as its clear what percentage means (percent of what? Total words in the passage?) that's fine too.

I like integers because it's easy to visualize and to exclude words that only occur a handful of times.


1
1 month ago
Report Content
avatar
マイコー
Level: 281

Unfortunately, there is really not going to be a good way to remove ruby stuff. I cannot even begin to think of a way to handle that in a way that is going to be even remotely accurate :(

For similar reasons, I probably will not be able to take older Japanese and adjust it in a way that is going to be accurate.


0
1 month ago
Report Content
avatar
マイコー
Level: 281

Although I had this planned as a feature later on, I was able to link the recent downtime on renshuu to some "bad" processing on text analyzer files. Nothing on the security end, but there is a small chunk of code that can get out of control, and I'm looking to pull it out.

I will most likely make this change in the next day or two, so any bugs can be reported here.

Currently, you can see that when you submit a text for analysis, it gives you back and estimated word count fairly quickly, before processing is done. This is due to a second text parser that is quick but less accurate. It's used for the temporary word count, and more importantly, breaking the text into (roughly) 1000 "word" chunks for easier processing and display in the reader.

This one is purely number-based (as to where it cuts off each chunk), so it can come right in the middle of a sentence.

Instead, I'd like it to make clean breaks at the end of lines. After thinking on it, I think I can get both the server issue and this feature nailed down in one go, but as always, it may adjust the way in which the text is parsed and displayed.

3
9 days ago
Report Content
avatar
マイコー
Level: 281

The new feature is in place. For most people, you will not notice any large changes, but do let me know if you run into any issues.

2
9 days ago
Report Content
Getting the posts


Page: 4 of 4



Top > renshuu.org > Feature Requests/Improvements


Loading the list
Lv.

Sorry, there was an error on renshuu! If it's OK, please describe what you were doing. This will help us fix the issue.

Characters to show:





Use your mouse or finger to write characters in the box.
■ Katakana ■ Hiragana