Content metadata: automated labelling and the CEFR
Companies that want to keep up with market developments can't do without well-organised metadata at the most granular level. To be future proof, they should embrace automated labelling.
Labels and metadata are used at various levels. In this blog series, we're focusing on content metadata, a field in its own right. Today, it's time for a closer look at the Common European Framework of Reference (CEFR). What is it, why and how should you use automated labelling, and what are the benefits of automation?
The CEFR: a brief explanation
The CEFR aims to provide a comprehensive learning, teaching, and assessment method that can be used for all European languages. Using six reference levels to indicate an individual's language proficiency, it is a reference framework that facilitates the assessment of a person's language proficiency.
The 'why' of automated labelling
If you're a publisher that wants to create a textbook for people at the B1 level of the CEFR framework, your content needs to meet the corresponding requirements. This means you should be able to test and analyse its readability.
Teachers and students, in turn, will want to find materials that meet their needs. They should be able to look for the right content by using level and topic filters.
How to use automated labels
Labelling used to be a manual task. Language experts would label large amounts of documents based on their knowledge and experience. Now, you can use these documents to train an AI model — based on the existing labelled content, it will learn to apply labels in an automated way.
After validating the outcomes of an automatically generated label, you'll have a validated machine learning model that can label each text correctly. You'll be able to repeat the process endlessly and label all content according to the CEFR framework.
Benefits of automation
A language expert might need about five to ten minutes per page to determine the correct label. A machine learning model can process hundreds of pages within the same period. So, it will save you a ton of time and energy!
Of course, there are other labels you can use for educational purposes. Want to know more about them? Keep an eye on our upcoming blog posts. Next time, we'll discuss keyword extraction.