renshuu.org requires Javascript to work correctly. Please enable Javascript and reload this page.

renshuu requires cookies to work correctly. Please enable cookies and reload this page.

掲示板 Forums - Fall 2024 Focus: Text Analyzer Improvements

Top > renshuu.org > Feature Requests/Improvements > Finished/Rejected Requests

マイコー

Level: 341

I've started adjusting the url (mainly for browser users) so that you have a url to jump into if you want to return to a certain page.

Also, back button does work to get you out of the text and back to the main list of texts, but I'm having trouble (without having to change a ton of other stuff) allowing the back button to just go back one page (if you are looking at a multi-page text).

1 year ago

Gibolt

Level: 1117

If you include page number and update the url param correctly, back should work as expected to go to previous page (if navigated from). It would be unexpected to go to page 3 from 4, if I'd opened 4 directly or from 1.

マイコーは 09月17日, 19:34に

I've started adjusting the url (mainly for browser users) so that you have a url to jump into if you want to return to a certain page.

と言いました。

For the suggested page tracking (if you end up working on it), could just be a return to (ex) Page 4 under/next to the doc title on the main list

1 year ago

Gibolt

Level: 1117

Tried using the filters, and noticed that the Kanji in particular don't highlight which uses you know or their school level. That would be OK, but there also isn't a quick way to view it in the dictionary. Clicking the character does not open it like in most places. Copying + pasting is tedious on mobile.

Just adding the dictionary icon to the icon list should be enough (which would benefit the Vocab items as well, to see more details), and/or enabling clicking the character.

1 year ago

マイコー

Level: 341

Gibolt は 09月18日, 11:55に

マイコーは 09月17日, 19:34に

I've started adjusting the url (mainly for browser users) so that you have a url to jump into if you want to return to a certain page.

と言いました。

For the suggested page tracking (if you end up working on it), could just be a return to (ex) Page 4 under/next to the doc title on the main list

と言いました。

It's not that easy, unfortunately. renshuu has a home-grown SPA (single-page-app) implementation which works, but is not what you'd call *thoroughly robust". So moving back is not just loading the pages (which could be done, if the SPA was not in place, but a ton of JS manipulating the browser history and cache for various speed benefits.

So while I'm pretty confident that it can be done, I'm not quite sure how to handle it.

I'll add some technical notes at the bottom.

As to the missing lightbulb - I'll consider adding that later, but not at the moment. That term list is the same as the term list everywhere else, so I'd need to make special options there to handle the extra icons.

Technical stuff.

So, for most devices, renshuu holds 3 or 4 of the last pages/modals (modals = term lists, grammar pages, and now, text analyzer reader) in a small cache. When a new page is loaded, the url is rewritten by JS, and then the page is loaded via ajax and put in place. Press back, and it rolls the window's history back one, which triggers the cache system to not actually reload the previous page, but to toss the cached page back up (unless the cache runs out).

Here's the problem: consider a 5 page reader.

When you open it, the page's url will be /text/text_id/pg_number, or as an example, /text/1000/0

If you go the next page, it is /text/1000/1, /text/1000/2, and so on.

If you were to use the back button, this would normally work fine - it would grab the last cached copy, which is the previous page. Easy!

However, let's say the cache has three pages on it:

/text/1000/0

/text/1000/1

/text/1000/2

If you then press the x in the top right, you'd want to clear out those 3 entries, and return to whatever was before it (the main TA page).

I originally tried something like this: (pseudo-code)

while( last cache entry is a reader page ) {

//add code to prevent the page from actually showing the cache for all the intermediary pages

window.location.back();

}

You may think this would work, as it would clean off all the cached pages, and clear the urls out of the browser's history (which you cannot do manually - you can either replace the current one, or add onto it)

However, window.location.back() is asynchronous! AHHHH. So that loop would run 1000 times before the 1st one finishes, and the browser would (and did) crash.

The main issue with all of this is that you cannot easily manipulate the browser history when going backwards - you must use the window.location.back().

Tricky.

1 year ago

マイコー

Level: 341

**[Improvement]** Text Analyzer has new reader settings for always or never showing furigana.

Also, I am looking into how to implement a way to show word frequency. I have all that data already (words, kanji, and grammar), however, I am not quite sure how we should add it into the tools.

I could just add another category here which is something like "Most common terms in text" (but something shorter), then perhaps a selector that lets you say how frequent you want them to be.

This is where it could get real messy. Since texts are of different lengths, we don't really want "more than xx times in the text".

Maybe "The most xx% common words in the text", with xx being 10,20,30,40,50, etc?

When this is selected (and perhaps, even without), I could have a bit of extra text in the term list that shows # of occurrences.

Thoughts?

Ninja edit: added small text for # of times in text.

1 year ago

キップ

Level: 216

I'd like to see the "always use kana when the original text does" option expanded to have a second, not mutually exclusive option: "keep usage of rare kanji intact where the original text used them". (Presumably it wouldn't do this if my settings had rare kanji spellings intact, but I got tired of learning them for words where it was only used some 0.00001% of the time when I enabled it to see and learn cases where it's used 5~20% of the time. (I'm excited to see the per-word toggle for that option! since you did mention you were working on that))

Thank you for everything, as always!!

1 year ago

gillianfaith

Level: 1384

The software I currently use to generate frequency lists, cb's Japanese Text Analysis Tool, displays the following vectors for frequency:

Number of occurrences
Overall frequency ranking
Term's use as a percentage of the whole text
Cumulative usage percentage of the term + every term ranked before it

I use all of that information at some point or another when making vocab lists, but it's a bit excessive to present as anything other than a spreadsheet. The cumulative percentage is probably the most useful and what I'd be most excited to see from the Text Analyzer, because it deals directly with the question of "How many words do I need to learn to understand 50% of this text? 80%?", etc., and I think being able to generate a list of terms in the Text Analyzer based on a percentage comprehension goal would add a lot of value.

I don't see much of a problem with a simple "more than xx times in the text" or "top xx most frequent words" type of list, if that number was just left blank by default and entered by the user, but I agree that going by percentages would be a cleaner solution to account for the wide range of possible text lengths. For me, the benefit of using a straight number instead of a percentage is mostly just to isolate words that are only used once, so regardless I'm a fan of showing the # of occurrences in the term list.

1 year ago

Gibolt

Level: 1117

An alternative would be a pair of range inputs or dual sliders, with min/max and a checkbox for property (occurrences/ percentage/rank)

1 year ago

マイコー

Level: 341

Making those numbers would be trivially easy, but not sure if it'd be better as a display, or a download.

1 year ago

Gibolt

Level: 1117

I tend to prefer custom filtering inline, especially since a download would already make filtering easyish in a spreadsheet tool, if you plan to do something in bulk offline with it.

Although, very small pagination is a regular limitation I encounter on the site. To me, if anything that is easy to hit the pagination limit had an <all> option or similar in the page dropdown, I'd be very happy.

1 year ago

ポールおじちゃん

Level: 1824

Maybe this feature already exists and I’m just not seeing it, but it sure would be nice to have support for ruby characters.

For example, if I paste this text into the text analyzer

the ruby text in the last line becomes 妖あやしい, so I have to manually go and delete either 妖 or あや. I haven’t really experimented with square brackets and double slashes, but even if they work that’s more fussing around that I’d rather have automated.

Also, since the majority of the un-copyright-encumbered text on the web is prewar vintage, it would be great to have basic support for modernizing old-style writing, things like だつた→だった, してゐた→していた, いふ→いう, etc.

1 year ago

VoidWinter

Level: 81

My 2 cents. For frequency I'm used to just an integer "occurs xx or more times", but as long as its clear what percentage means (percent of what? Total words in the passage?) that's fine too.

I like integers because it's easy to visualize and to exclude words that only occur a handful of times.

1 year ago

マイコー

Level: 341

Unfortunately, there is really not going to be a good way to remove ruby stuff. I cannot even begin to think of a way to handle that in a way that is going to be even remotely accurate :(

For similar reasons, I probably will not be able to take older Japanese and adjust it in a way that is going to be accurate.

1 year ago

マイコー

Level: 341

Although I had this planned as a feature later on, I was able to link the recent downtime on renshuu to some "bad" processing on text analyzer files. Nothing on the security end, but there is a small chunk of code that can get out of control, and I'm looking to pull it out.

I will most likely make this change in the next day or two, so any bugs can be reported here.

Currently, you can see that when you submit a text for analysis, it gives you back and estimated word count fairly quickly, before processing is done. This is due to a second text parser that is quick but less accurate. It's used for the temporary word count, and more importantly, breaking the text into (roughly) 1000 "word" chunks for easier processing and display in the reader.

This one is purely number-based (as to where it cuts off each chunk), so it can come right in the middle of a sentence.

Instead, I'd like it to make clean breaks at the end of lines. After thinking on it, I think I can get both the server issue and this feature nailed down in one go, but as always, it may adjust the way in which the text is parsed and displayed.

1 year ago

マイコー

Level: 341

The new feature is in place. For most people, you will not notice any large changes, but do let me know if you run into any issues.

1 year ago

DoubleShift

Level: 667

I'm not sure if this was suggested before, but when using "work with this passage" tool it I think would be quite useful to have an option to sort the list by frequency (times in text)

1 year ago

ハシュミナ

Level: 1293

I agree with VoidWinter, for me an integer to select for frequency would be the most understandable (even though I like the idea of coverage). Or it would be at least nice to be able to sort by frequency.

Some other things I thought about:

- New action: link parsed sentences to the vocabulary (Pro Users can then have mined sentences for their vocabulary in the vocab sentence questions). Maybe limit it to vocab without sentences. Downside I see here that sometimes sentence quality might vary (e.g. very short sentences, broken off sentences). Maybe just sentences that are fully parsed (no unparsed words) with a minimum length? Probably complicated to implement.

Or maybe not do this automatically but have like an additional context button when going over the text in Read Modus that lets you add a parsed sentence to your User Sentences manually.

1 year ago

マイコー

Level: 341

The frequency sort/filter functionality will definitely be added, but I do not believe I'll be able to get it done during this round of updates.

1 year ago

Getting the posts

Top > renshuu.org > Feature Requests/Improvements > Finished/Rejected Requests

和英辞典Vocabulary dictionary

Filters

漢字辞典 Kanji dictionary

Filters

文法辞典 Grammar dictionary

Filters

例文検索 Sentence lookup

掲示板 Forums - Fall 2024 Focus: Text Analyzer Improvements