Making Sense of Tech

What are xLSTMs, and which problems do they solve?

on May 15, 2024 by Dominik

It's been a few months since Sepp Hochreiter (one of the big names in AI) announced working strongly towards new AI architectures allowing to meet state-of-the-art LLM capabilities while requiring much less computing power. This week, his team published research where they shed more light on what they are doing. And introduced a new buzzword: xLSTM.

Let me explain what they did in simple terms.

What is the problem?

When you talk to another person, the content of what you say builts up on your previous sentences. Therefore, to fully understand the meaning of what you say, I will usually need to know what you said before – your words depend on the previous words you spoke. Most current LLMs address this by using Transformer architectures - for each word processed by an LLM, its connection to every other word in the text is calculated. This leads to high computation costs for long texts.

Some further background?

An older AI architecture is called long short-term memory (LSTM). It uses another approach to understand the content of what you say. It goes through a text from beginning to end, and at each word, it decides which information is relevant and must be memorized. This creates much less computational costs, compared to Transformers. But it is also less capable of understanding a sentence in the context of the complete text. Therefore, Transformers beat LSTMs in regards to quality, which is why they dominate LLM architectures today.

What's new?

In their research, the authors (including inventors of LSTMs) improve on the LSTM architecture to introduce xLSTM. An improved memory mechanism memorizes the relevant information better - this shall close the quality gap to Transformer-LLMs. Further adaptions improve on performance to make the algorithm run faster (or: parallelized).

Why is this relevant?

If the promises hold, LLMs will get faster, consume less power, and allow to bring higher quality LLMs to smaller computers such as laptops. Which translates into lower costs of using AI, and opens new use cases. First tests indicate that their promise can hold - but it needs verification by others through applying this architecture on LLMs and testing their performance and quality.

High-level view

On more general terms, it's great to be reminded that major breakthroughs in the AI domain came through people making up their mind to invent new mechanisms and architectures. Creating progress by increasing the amount of training data more and more works, but has its limits.