LLaMA 4 is Meta's fourth generation of open-source large language models, released in March 2026. The flagship 400B parameter model uses a Mixture-of-Experts (MoE) architecture, making it run efficiently on just two A100 GPUs despite its massive parameter count. It surpasses GPT-4o on MMLU, HumanEval, and MATH benchmarks. The family includes 8B (CPU-capable), 70B (single GPU), and 400B MoE variants, all supporting 128K context windows and 35 languages. Within 72 hours of release, 400+ community fine-tunes had appeared on HuggingFace.