What is Model Distillation?
Training a smaller "student" model to replicate the behavior of a larger "teacher" model at lower cost.
Definition
Model distillation (knowledge distillation) is a technique where a smaller, more efficient "student" model is trained to mimic the outputs and internal representations of a larger, more capable "teacher" model. Rather than training the student from scratch on raw data, it learns from the teacher's soft probability distributions over outputs, which contain richer information than hard labels alone. This transfers the teacher's learned knowledge into a more compact form. Distilled models can achieve 90-99% of the teacher's quality at a fraction of the size and inference cost. Many production AI systems use distilled models for latency-sensitive applications.
Examples
Why It Matters
Distillation is why smaller AI models can be surprisingly capable — they learned from larger ones. It explains the quality gap between model tiers and why "fast" models can still produce good results.
Related Terms
Pruning
Removing unnecessary parameters from a neural network to make it smaller and faster without significant quality loss.
AI Inference Optimization
Techniques that make AI models generate responses faster and cheaper without reducing output quality.
Model Merging
Combining weights from multiple fine-tuned models into a single model that inherits capabilities from each.
LoRA (Low-Rank Adaptation)
A parameter-efficient fine-tuning method that trains small adapter matrices instead of modifying the full model.
Common Questions
What does Model Distillation mean in simple terms?
Training a smaller "student" model to replicate the behavior of a larger "teacher" model at lower cost.
Why is Model Distillation important for AI users?
Distillation is why smaller AI models can be surprisingly capable — they learned from larger ones. It explains the quality gap between model tiers and why "fast" models can still produce good results.
How does Model Distillation relate to AI chatbots like ChatGPT?
Model Distillation is a fundamental concept in how AI assistants like ChatGPT, Claude, and Gemini work. For example: DistilBERT achieving 97% of BERT's performance at 60% of the size and 2x the speed Understanding this helps you use AI tools more effectively.
Related Use Cases
AI Models Using This Concept
See Model Distillation in Action
Council lets you compare responses from multiple AI models side-by-side. Experience different approaches to the same prompt instantly.