Home > Models & Research

Models & Research

Pioneering the Future

MERaLiON (Multimodal Empathetic Reasoning and Learning in One Network) is Southeast Asia’s empathetic Multimodal Large Language Model (MLLM), designed to understand the region’s diverse languages, cultures, and communication styles.

Unlike conventional speech AI, MERaLiON is designed for how people truly communicate - seamlessly code-switching between languages, using local dialects, and expressing meaning through tone, emotion, and context. These nuances are often lost in traditional systems, limiting their effectiveness in real-world applications, particularly in critical sectors such as healthcare and social services.

MERaLiON bridges this gap. By going beyond simple transcription, it captures both what is said and how it is said, enabling deeper contextual understanding, richer insights, and more natural, human-centric AI interactions.

A New Class of Speech-First AI


At its core, MERaLiON introduces a speech-first architecture that processes raw audio end-to-end. By tightly integrating a speech encoder with a text decoder, the model delivers powerful, real-time understanding of spoken communication.

With MERaLiON, you can:
Answer questions directly from speech inputs
Summarise conversations and dialogues with clarity
Detect emotions, tone, and intent
Interpret acoustic environments and contextual signals

Supporting seven core languages - English, Mandarin, Malay, Tamil, Bahasa Indonesia, Thai, and Vietnamese = along with Singlish, Cantonese, and Hokkien, MERaLiON reflects the authentic linguistic diversity of Southeast Asia.

Open, Collaborative, and Built for Impact


MERaLiON is developed with an open and transparent approach. By releasing model weights, benchmarks, and resources, it empowers researchers, developers, and enterprises to innovate, adapt, and deploy solutions that meet real-world needs.

MERaLiON isn’t just advancing AI - it’s redefining how AI understands people.

MERaLiON Model Portfolio


MERaLiON fuses a speech encoder and text decoder to process raw audio end-to-end, reasoning directly from sound rather than transcribing first, and handling spoken QA, dialogue summarisation, and emotion inference, supporting speech across 7 languages (English, Mandarin, Malay, Tamil, Bahasa Indonesia, Thai, and Vietnamese) plus Singlish, Cantonese, and Hokkien.

Our research is organized into core collections representing the evolution of Southeast Asian-centric AI. We prioritise transparency by releasing our model weights and benchmarks for community evaluation. 
Explore all MERaLiON models and resources on Hugging Face.

CollectionVersionsDescription
MERaLiON-3 (Preview)10B-preview

 • Next-generation 10B model for speech-native reasoning.

 • Handles spoken QA, understanding speaker attributes, and paralinguistic reasoning 
(emotion, stress, acoustic scenes) directly from raw audio.

MERaLiON-2 10B, 10B-ASR, 10B-MLX, 3B, 3B-MLX

  • Robust performance for a series of speech comprehension tasks with competitive performance in speech transcription for English, Mandarin, Malay, Tamil, Bahasa Indonesia, Thai, and Vietnamese.

  • Supports speech recognition, translation, emotion understanding, and instruction-following across Southeast Asian languages.

  Speech Emotion Recognition SER v1

  • Classifies speaker emotions from raw audio.

  • Designed for multilingual, real-world conversational scenarios.

  Speech-Encoder SpeechEncoder-v1, SpeechEncoder-2 

  • High-performance multilingual speech foundation model on SEA languages for downstream speech AI tasks.

  • Strong capability in code-switching and local dialects.


Built at Scale for Impact


MERaLiON was trained on NSCC Singapore’s ASPIRE 2A+ infrastructure - a high-performance platform enabling large-scale multimodal AI development.

Build with MERaLiON​


Playground
Experience MERaLiON. Upload or
record an audio clip, then ask
anything: transcribe, translate,
summarise, detect emotions.
API Console​
Access MERaLiON via the API
console. Ideal for developers
building applications.​
Download​
Download and deploy MERaLiON
models directly on your own systems.​
Connect​
Connect with the team to discuss
consortium membership and co-
development opportunities.​


Research Library

No. List of Papers Date
1. AdaMCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought 27 Jan 2026
2. Latent-RQ: Enhancing Speech Pre-training with Latent Representations and Random Quantization 27 Jan 2026
3. Train Multi-Modal LLM to Understand Diverse Speech Paralinguistics by Distilling from Teacher with Meta-Information Prompt 27 Jan 2026
4. IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models 12 Nov 2025
5. MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages 7 Nov 2025
6. A Benchmark for Translations Across Styles and Language Variant 4 Nov 2025
7. Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs 29 Sep 2025
8. Benchmarking Contextual and Paralinguistic Reasoning in Speech-LLMs: A Case Study with In-the-Wild Data 24 Sep 2025
9. Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models 10 Aug 2025
10. MERaLiON-AudioLLM: Advancing Speech and Language Understanding for Singapore 27 Jul 2025
11. CCL-XCoT: An Efficient Cross-Lingual Knowledge Transfer Method for Mitigating Hallucination Generation 17 Jul 2025
12. Contextual Paralinguistic Data Creation for Multi-Modal Speech-LLM: Data Condensation and Spoken QA Generation 19 May 2025
13. Advancing Singlish Understanding: Bridging the Gap with Datasets and Multimodal Models 2 Jan 2025
14. MERaLiON-SpeechEncoder: Towards a SpeechFoundation Model for Singapore and Beyond 20 Dec 2024
15. MERaLiON-AudioLLM: Technical Report 13 Dec 2024
16. MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders 10 Sep 2024
17. PRESENT: Zero-Shot Text-to-Prosody Control 13 Aug 2024
18. AudioBench: A Universal Benchmark for Audio Large Language Models 23 Jan 2024
19. SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning 9 Sep 2023