Technology

Hi, you’ve come to the Linguineo technology page. Well done. Now, the information you can find here will tell you all about the technology behind Linguineo. You’ll need a little focus to read it all. But since you clicked this page, you must surely have the brains for it.

TLDR:

At Linguineo we develop custom projects where users learn a language. Central to these projects is our own unique voicebot and speech recognition technology. Practice speaking and writing by talking to virtual characters aka conversational AI.

Okay, so now for the long explanation. Our copywriter tried to make it a bit more fun for you to read, so thanks for giving it a go. She’ll be stoked! Also, be warned, if we say we’re the best at what we do 1 time too much: we are trying to sell this thing obviously.

1. Our own speech recognition for language learners

(e.g. usable speech recognition for non-native speakers / language learners)

What is it?

Off-the-shelf speech recognition models do not work well enough for recognizing non-native speech (>20% word error rate), so we created custom speech recognition models for this (<10% word error rate). Although it would be great to be able to use an off-the-shelf solution and not having to maintain our own speech recognition models anymore, we don’t see this happening soon, especially not for lower resource languages, even with all the quick advances in AI. Lately, we also have been collaborating with the Esat research group at KULeuven to try to push these own models even further.

What did we create ourselves?

Smart model selection (e.g. the systems knows what is the best model to use for which learner). It can only do this by having a few manually (by a human) transcribed sentences though. Once these are there it calculates the error on these for various models, and based on this, knows the accuracy of the ASR for that user and knows what is the best model to select.
Custom trained deep learning speech recognitions models that start from a foundation model but are adapted using “transfer learning” (this means to remove the end of the neural network and retrain the end with a limited set of niche data). Our current models for non-native speakers are an example of this.

2. Content creation tool for conversations

(e.g. non-technical people can quickly create a conversation with any constraint about any topic or story)

What is it?

Content creators often want to control the contents of a conversation in great detail for didactical and other reasons. There are no off-the-shelf-solutions that make this easy for language learning conversations, but our component does. This content creation tool has varying degrees of difficulty. A first version – either open or steered – can be created on a few simple textual inputs, after which the conversation can be practiced. This conversation can be further edited in varying degrees of complexity, but this is only necessary in specific cases.

What did we create ourselves?

A very rich steered conversational model and chatbot engine, with which any type of conversation can be created as a flowchart, that can integrate seamlessly with any generative AI model if necessary. This component leans in into creating a “formal grammar” for language, but whereas the academic world tries to do this exhaustively (lose 0% of the info), we do this in a much more practical way, making this much more feasible. We also developed a content creation tool that allows to make conversations in 20 minutes, but also allows trained people to change every single thing about a language learning conversation (structure, tasks given, which adaptivity, which answer type, behaviour of the UX, of the bot,..).
A scaffolding system of several components for the conversation itself to interpret the response and formulate an answer. The five typical parts of this chain: caching systems, intent detection, sentence similarity, smaller deep learning BERT-based models and a regular or custom LLM at the end. The first parts are the quickest and cheapest to run, the last ones the slowest and most expensive (but the strongest). Each of these subsystems have their own long explanation. The custom LLM at the end we are currently bringing to a more than state-of-the-art-level together with research group IDLab of UGent within our ongoing research project Capture.

OK! That’s a lot to take in. So give your mind a break with this humorous interruption:

Two scientists walk into a bar:

“I’ll have an H2O.”

“I’ll have an H2O, too.”

The bartender gives them both water because he is able to distinguish the boundary tones that dictate the grammatical function of homonyms in coda position as well as pragmatic context.

Back to the technical part!

3. Adaptivity algorithm

(e.g. quickly builds a detailed profile of learners and makes every conversation exercise useful for almost every learner)

What is it?

The adaptivity algorithm adds finegrained adaptivity to our language learning conversations and mostly solves the problem of giving each learner a useful item with the right support. Beginners will get very easy tasks with a lot of support, learners that know the language well but have a strong accent get tasks and support on speaking, people that are advanced get minimal support, people that are better at writing than speaking get exercises and support on writing,…

What did we create ourselves?

A system to deduce a rich learner and item profile from the “open data” (as described in our article about adaptivity).
A self learning system that uses the rich learner and item profile to make a prediction of what is a “good next item” for the learner.

We posted a blog article explaining this approach in detail here.

We also didn’t create this algorithm ourselves. We created it in collaboration with research group Itec of KULeuven.

4. Personalized corrective feedback and pronunciation analysis

What is it?

A system that is able to give detailed corrective feedback on both grammar and pronunciation, taking into account the learner profile.

What did we create ourselves?

A pronunciation analysis component that can give detailed pronunciation feedback for a learner utterance, on both the word and phoneme level.
A layered corrective feedback component that can give detailed grammar feedback on any learner utterance.

Obviously, we didn't create all the tech ourselves

It is also important to realize we use many state-of-the-art components we didn’t develop ourselves. For example, for speech synthesis we mostly aggregate the biggest speech synthesis providers like Microsoft, Google, Amazon, ElevenLabs and Acapela, with just a few own components built around them. For speech recognition we built our own engine but one of the features of this engine is “smart model selection”, and we can plug in any external speech recognition model or API on top of our own models. For open conversations, we obviously base ourselves on foundation models.

We sometimes get the question “Is the state of the art not catching up with you?” The opposite is true, exactly due to the building block approach. We feel like we are increasing our distance from the state of the art each year for our particular use case(s).

It's not about the tech. Not really.

Just having the needed technology and building blocks for our use case does not make for the best voicebot applications though. Making great language learning conversations is also:

combining the building blocks in a way the domain experts agree with, for which we work together with research groups, publishers and a vast network of experts related to language learning,
having the best UX, after 50+ user testings, tons of insights about chatbot/voicebot interfaces for various audiences, and working with professional UX people,
game design to keep the learners as motivated and engaged as possible, by working together with game professionals.

We are not a pure technology company, but a company that tries to combine tech, didactical, UX and game expertise into great products.