AI Seminar Series: Kajetan Schweighofer

Oct 11, 2025
AI Seminar Banner
Kajetan Schweighofer

Kajetan Schweighofer

Venue: TII Yas Auditorium

28th October 2025,11:00AM - 12:00PM (GST)

Title:Beyond Attention: xLSTM Scales Competitively with Linear Time-Complexity
Abstract:This talk discusses recent advances in the understanding of scaling behavior for large language models (LLMs), comparing the attention-based Transformer architecture to the recurrent xLSTM architecture. We begin with a brief introduction to xLSTM and its relation to other recently proposed Transformer alternatives such as Mamba-2. Next, we examine results on the comparison of scaling behavior between Transformers and xLSTM by means of empirically determined scaling laws. Those show that the xLSTM architecture is Pareto-dominant in terms of cross-entropy loss and compute budget. This means that it is always possible to obtain an xLSTM model that is both better (lower loss) and cheaper (less compute) within the analyzed ranges. We then analyze how these architectures scale with different compute budgets and how compute-optimal models compare to each other on the language modeling task. This is also extended to the practically relevant overtraining regime, showing that the established scaling laws remain stable even at high token to parameter ratios. Importantly, training scaling behavior is examined with respect to the context length, a critical aspect when comparing Transformers that scale quadratic in context length to alternative architectures that scale linearly. The results show that the benefit of xLSTM over Transformers increases for larger context lengths. Finally, inference scaling behavior is analyzed, finding that xLSTM has both lower time to first token (latency) and lower step time (per-token generation speed).
Bio:Kajetan Schweighofer is a final year PhD student at Johannes Kepler University Linz, supervised by Prof. Sepp Hochreiter, inventor of LSTMs. His PhD focuses on predictive uncertainty quantification for deep learning models. This extends to natural language generation, using uncertainty information for hallucination detection of LLMs. Recently, his work has expanded to studying the scaling properties of LLM architectures, including transformer alternatives such as xLSTM. His research has been published at the major AI conferences NeurIPS, ICML and ICLR. Kajetan is part of the ELLIS PhD Program, fostering collaboration between centers of AI excellence across Europe, and conducted a research stay at the ELLIS Alicante Foundation. Prior to his PhD, he obtained a Bachelor's and Master's in Physics, as well as a Master's in Artificial Intelligence from Johannes Kepler University Linz.
More News
tii_logo_thumbnail
Technology Innovation Institute

Opossum Attack

SCROLL TO EXPLORE MORE