Level of proficiency

Hi,

I only wanted to know how you calculate the level (A1 to C1) for a text? That would be very useful so I can justify wich text to use with my students.

Thanks a lot

2 votes

Anthony Rancourt shared this idea · Jan 16, 2023 · Report… · Admin →

AdminReadlang (Language learning app, Readlang) responded · Jul 11, 2023

The difficulty of each text is calculated via quite a crude approach described here: https://readlang.uservoice.com/knowledgebase/articles/722085-how-is-the-difficulty-of-a-text-calculated

To be honest, they should be taken with a pinch of salt.

An error occurred while saving the comment

Anna Vernerová commented · February 11, 2024 10:48 AM · Report

I think relying on 2000 most common word forms is not suitable for languages with rich inflection, where the 2000 most common lexemes (dictionary entries, lemmas) may easily produce tens or even hundreds of thousands of different word forms. I am seeing the same problem as Den K for Finnish, and the reason is the same in both cases: these are agglutinative languages where words take on a large number of endings.

I can think of two different approaches to address this problem:
- use comparable word lists for multiple languages (https://en.wiktionary.org/wiki/Wiktionary:Frequency_lists, e.g. choose the open subtitles collection) and rather than calculating the proportion of words among the top 2000, calculate the average relative rank (e.g. "this text contains mostly words that appear in the top 5% of the list) - because languages with richer inflection should produce longer word lists, the relative rank could be a good measure

- take a list of not just 2000 most common word forms, but as many as you can possibly get and translate it into English and compare the translations with the list of 2000 most common English word forms (but be aware that e.g. for Finnish, the translation will be e.g. 'of my house' and you only want to know whether 'house' appears in the English list) - likely, you'll find that you can choose a cutoff such that the first X words mostly translate to the common English words while words lower in the list mostly don't (even though of course it will not be entirely clear where that cutoff should be set)

- parse each text and then compare the list of resulting lemmas to the list of X (X<2000, but I don't know by how much) most common lemmas in the language. A parser with a large number of language models (BY-NC-SA) would be https://ufal.mff.cuni.cz/udpipe/2 .

Submitting...
Den K commented · July 24, 2023 9:53 AM · Report

Am I correct that it's not possible to change level manually? Currently it grades A1 Turkish short story (from somewhat credible source) as C1 without an option to change level manually.

Submitting...

I suggest you ...

Level of proficiency

Feedback

General

Feedback and Knowledge Base

Searching…

Give feedback

Knowledge Base

Readlang

Level of proficiency

We're glad you're here

We're glad you're here

We're glad you're here

We're glad you're here

General

Categories

Searching…

Contact support

Give feedback

Knowledge Base

Readlang