Home > Models & Research

Models & Research

Pioneering the Future

MERaLiON (Multimodal Empathetic Reasoning and Learning in One Network) is Southeast Asia’s empathetic Multimodal Large Language Model (MLLM), designed to understand the region’s diverse languages, cultures, and communication styles.

Unlike conventional speech AI, MERaLiON is designed for how people truly communicate - seamlessly code-switching between languages, using local dialects, and expressing meaning through tone, emotion, and context. These nuances are often lost in traditional systems, limiting their effectiveness in real-world applications, particularly in critical sectors such as healthcare and social services.

MERaLiON bridges this gap. By going beyond simple transcription, it captures both what is said and how it is said, enabling deeper contextual understanding, richer insights, and more natural, human-centric AI interactions.

A New Class of Speech-First AI

At its core, MERaLiON introduces a speech-first architecture that processes raw audio end-to-end. By tightly integrating a speech encoder with a text decoder, the model delivers powerful, real-time understanding of spoken communication.

With MERaLiON, you can:

• Answer questions directly from speech inputs

• Summarise conversations and dialogues with clarity

• Detect emotions, tone, and intent

• Interpret acoustic environments and contextual signals

Supporting seven core languages - English, Mandarin, Malay, Tamil, Bahasa Indonesia, Thai, and Vietnamese - along with Singlish, Cantonese, and Hokkien, MERaLiON reflects the authentic linguistic diversity of Southeast Asia.

Open, Collaborative, and Built for Impact

MERaLiON is developed with an open and transparent approach. By releasing model weights, benchmarks, and resources, it empowers researchers, developers, and enterprises to innovate, adapt, and deploy solutions that meet real-world needs.

MERaLiON isn’t just advancing AI - it’s redefining how AI understands people.

MERaLiON Model Portfolio

MERaLiON fuses a speech encoder and text decoder to process raw audio end-to-end, reasoning directly from sound rather than transcribing first, and handling spoken QA, dialogue summarisation, and emotion inference, supporting speech across 7 languages (English, Mandarin, Malay, Tamil, Bahasa Indonesia, Thai, and Vietnamese) plus Singlish, Cantonese, and Hokkien.

Our research is organized into core collections representing the evolution of Southeast Asian-centric AI. We prioritise transparency by releasing our model weights and benchmarks for community evaluation. Explore all MERaLiON models and resources on Hugging Face.

Collection	Versions	Description
MERaLiON-3	10B, 3B-ASR	Next-generation model for speech-native reasoning. Handles spoken QA, understanding speaker attributes, and paralinguistic reasoning (emotion, stress, acoustic scenes) directly from raw audio.
MERaLiON-2	10B, 10B-ASR, 10B-MLX, 3B, 3B-MLX	Robust performance for a series of speech comprehension tasks with competitive performance in speech transcription for English, Mandarin, Malay, Tamil, Bahasa Indonesia, Thai, and Vietnamese. Supports speech recognition, translation, emotion understanding, and instruction-following across Southeast Asian languages.
Speech Emotion Recognition	SER v1	Classifies speaker emotions from raw audio. Designed for multilingual, real-world conversational scenarios.
Speech-Encoder	SpeechEncoder-v1, SpeechEncoder-2	High-performance multilingual speech foundation model on SEA languages for downstream speech AI tasks. Strong capability in code-switching and local dialects.

Built at Scale for Impact

MERaLiON was trained on NSCC Singapore’s ASPIRE 2A+ infrastructure - a high-performance platform enabling large-scale multimodal AI development.

Build with MERaLiON

Playground

Experience MERaLiON. Upload or
record an audio clip, then ask
anything: transcribe, translate,
summarise, detect emotions.

Try the MERaLiON Playground

API Console

Access MERaLiON via the API
console. Ideal for developers
building applications.

Get MERaLiON API Access

Download

Download and deploy MERaLiON
models directly on your own systems.

Download & Self-Host

Connect

Connect with the team to discuss
consortium membership and co-
development opportunities.

Join the Consortium

Research Library

No.	List of Papers	Date
1.	Direct Preference Optimization for English-Mandarin Code-Switching Speech Recognition in Audio LLMs	4 June 2026
2.	AdaMCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought	27 Jan 2026
3.	Latent-RQ: Enhancing Speech Pre-training with Latent Representations and Random Quantization	27 Jan 2026
4.	Train Multi-Modal LLM to Understand Diverse Speech Paralinguistics by Distilling from Teacher with Meta-Information Prompt	27 Jan 2026
5.	IFEval-Audio: Benchmarking Instruction-Following Capability in Audio-based Large Language Models	12 Nov 2025
6.	MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages	7 Nov 2025
7.	A Benchmark for Translations Across Styles and Language Variant	4 Nov 2025
8.	Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs	29 Sep 2025
9.	Benchmarking Contextual and Paralinguistic Reasoning in Speech-LLMs: A Case Study with In-the-Wild Data	24 Sep 2025
10.	Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models	10 Aug 2025
11.	MERaLiON-AudioLLM: Advancing Speech and Language Understanding for Singapore	27 Jul 2025
12.	CCL-XCoT: An Efficient Cross-Lingual Knowledge Transfer Method for Mitigating Hallucination Generation	17 Jul 2025
13.	Contextual Paralinguistic Data Creation for Multi-Modal Speech-LLM: Data Condensation and Spoken QA Generation	19 May 2025
14.	Advancing Singlish Understanding: Bridging the Gap with Datasets and Multimodal Models	2 Jan 2025
15.	MERaLiON-SpeechEncoder: Towards a SpeechFoundation Model for Singapore and Beyond	20 Dec 2024
16.	MERaLiON-AudioLLM: Technical Report	13 Dec 2024
17.	MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders	10 Sep 2024
18.	PRESENT: Zero-Shot Text-to-Prosody Control	13 Aug 2024
19.	AudioBench: A Universal Benchmark for Audio Large Language Models	23 Jan 2024
20.	SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning	9 Sep 2023