Mixture of Experts

An architecture that routes inputs to specialized sub-networks, enabling massive scale efficiently.

1 article0 related tools5 related tagstechnology

Key Facts

LLaMA 4-400B Active Params~40B per token

Efficiency Gain~10× vs dense equivalent

Expert Count64 experts (LLaMA 4)

Router MechanismTop-2 of N expert selection

Pioneer Paper"Outrageously Large NNs" (2017)

Inference HW2× A100 for 400B MoE

Mixture-of-Experts (MoE) is a neural network architecture where the model consists of many 'expert' sub-networks and a learned 'router' that decides which experts to activate for each input token. Only a small fraction of experts are active at any time, making MoE models far more computationally efficient than dense models of the same parameter count. LLaMA 4-400B uses MoE — activating ~40B parameters per token despite having 400B total. GPT-4 is widely believed to be a MoE model internally. MoE is now considered essential for building frontier-scale models economically.

1 Story tagged#mixture-of-experts

Open Source

Meta's LLaMA 4 Drops with 400B Parameters — Challenges GPT-4 on Every Benchmark

The open-source release includes three model sizes and a new mixture-of-experts architecture. Developers are already fine-tuning it for specialized use cases within hours of release.

Mar 11, 20265 minDavid Park

Mixture of Experts

1 Story tagged#mixture-of-experts

Meta's LLaMA 4 Drops with 400B Parameters — Challenges GPT-4 on Every Benchmark

More from AI Trends

GPT-5 Leaks Surface: OpenAI's Next Model Could Reason Like a PhD Physicist

Anthropic Secures $4B from Google, Valued at $61B — Biggest AI Round of 2026

EU AI Act Phase 2 Kicks In: What Every AI Company Must Know Before June 2026

Related Tags

Browse All Topics

AI Trends Weekly