Home >
Models & Research
Models & Research
Pioneering the Future
MERaLiON (Multimodal Empathetic Reasoning and Learning in One Network) is Southeast Asia’s empathetic Multimodal Large Language Model (MLLM), designed to understand the region’s diverse languages, cultures, and communication styles.
Unlike conventional speech AI, MERaLiON is designed for how people truly communicate - seamlessly code-switching between languages, using local dialects, and expressing meaning through tone, emotion, and context. These nuances are often lost in traditional systems, limiting their effectiveness in real-world applications, particularly in critical sectors such as healthcare and social services.
MERaLiON bridges this gap. By going beyond simple transcription, it captures both what is said and how it is said, enabling deeper contextual understanding, richer insights, and more natural, human-centric AI interactions.
A New Class of Speech-First AI
At its core, MERaLiON introduces a speech-first architecture that processes raw audio end-to-end. By tightly integrating a speech encoder with a text decoder, the model delivers powerful, real-time understanding of spoken communication.
With MERaLiON, you can:
• Answer questions directly from speech inputs
• Summarise conversations and dialogues with clarity
• Detect emotions, tone, and intent
• Interpret acoustic environments and contextual signals
Supporting seven core languages - English, Mandarin, Malay, Tamil, Bahasa Indonesia, Thai, and Vietnamese = along with Singlish, Cantonese, and Hokkien, MERaLiON reflects the authentic linguistic diversity of Southeast Asia.
Open, Collaborative, and Built for Impact
MERaLiON is developed with an open and transparent approach. By releasing model weights, benchmarks, and resources, it empowers researchers, developers, and enterprises to innovate, adapt, and deploy solutions that meet real-world needs.
MERaLiON isn’t just advancing AI - it’s redefining how AI understands people.
MERaLiON Model Portfolio
MERaLiON fuses a speech encoder and text decoder to process raw audio end-to-end, reasoning directly from sound rather than transcribing first, and handling spoken QA, dialogue summarisation, and emotion inference, supporting speech across 7 languages (English, Mandarin, Malay, Tamil, Bahasa Indonesia, Thai, and Vietnamese) plus Singlish, Cantonese, and Hokkien.
Our research is organized into core collections representing the evolution of Southeast Asian-centric AI. We prioritise transparency by releasing our model weights and benchmarks for community evaluation. Explore all MERaLiON models and resources on Hugging Face.
Our research is organized into core collections representing the evolution of Southeast Asian-centric AI. We prioritise transparency by releasing our model weights and benchmarks for community evaluation. Explore all MERaLiON models and resources on Hugging Face.
| Collection | Versions | Description |
|---|---|---|
| MERaLiON-3 (Preview) | 10B-preview | • Next-generation 10B model for speech-native reasoning. • Handles spoken QA, understanding speaker attributes, and paralinguistic reasoning (emotion, stress, acoustic scenes) directly from raw audio. |
| MERaLiON-2 | 10B, 10B-ASR, 10B-MLX, 3B, 3B-MLX | • Robust performance for a series of speech comprehension tasks with competitive performance in speech transcription for English, Mandarin, Malay, Tamil, Bahasa Indonesia, Thai, and Vietnamese. • Supports speech recognition, translation, emotion understanding, and instruction-following across Southeast Asian languages. |
| Speech Emotion Recognition | SER v1 | • Classifies speaker emotions from raw audio. • Designed for multilingual, real-world conversational scenarios. |
| Speech-Encoder | SpeechEncoder-v1, SpeechEncoder-2 | • High-performance multilingual speech foundation model on SEA languages for downstream speech AI tasks. • Strong capability in code-switching and local dialects. |
Built at Scale for Impact
MERaLiON was trained on NSCC Singapore’s ASPIRE 2A+ infrastructure - a high-performance platform enabling large-scale multimodal AI development.
Build with MERaLiON
Playground
Experience MERaLiON. Upload or
record an audio clip, then ask
anything: transcribe, translate,
summarise, detect emotions.
record an audio clip, then ask
anything: transcribe, translate,
summarise, detect emotions.

API Console
Access MERaLiON via the API
console. Ideal for developers
building applications.
console. Ideal for developers
building applications.


Connect
Connect with the team to discuss
consortium membership and co-
development opportunities.
consortium membership and co-
development opportunities.

