AI Glossary

What is Pruning?

Removing unnecessary parameters from a neural network to make it smaller and faster without significant quality loss.

By Council Research TeamUpdated: Jan 27, 2026

Definition

Pruning is a model compression technique that identifies and removes parameters (weights, neurons, or entire layers) that contribute minimally to the model's output quality. The key insight is that large neural networks are typically over-parameterized — many weights are near zero or redundant. Pruning methods include magnitude pruning (removing smallest weights), structured pruning (removing entire channels or attention heads), and iterative pruning-retraining cycles. Unstructured pruning can achieve high sparsity (90%+) but requires specialized hardware to realize speed gains. Structured pruning provides immediate speedups on standard hardware.

Examples

1Removing 90% of weights from a model while retaining 95% of accuracy

2Structured pruning that removes entire attention heads that contribute little to output quality

3SparseGPT pruning large language models to 50% sparsity in one shot without retraining

4Lottery ticket hypothesis finding small subnetworks within large models that match full performance

Why It Matters

Pruning enables AI models to run on less powerful hardware, including phones and laptops. It is one of the key techniques making local AI assistants and on-device processing possible.

Related Terms

Model Distillation

Training a smaller "student" model to replicate the behavior of a larger "teacher" model at lower cost.

AI Inference Optimization

Techniques that make AI models generate responses faster and cheaper without reducing output quality.

Mixed Precision Training

Training neural networks using a mix of 16-bit and 32-bit floating-point numbers to save memory and increase speed.

Sparse Attention

An efficient attention mechanism that processes only a subset of token relationships instead of all pairs.

Common Questions

What does Pruning mean in simple terms?

Removing unnecessary parameters from a neural network to make it smaller and faster without significant quality loss.

Why is Pruning important for AI users?

Pruning enables AI models to run on less powerful hardware, including phones and laptops. It is one of the key techniques making local AI assistants and on-device processing possible.

How does Pruning relate to AI chatbots like ChatGPT?

Pruning is a fundamental concept in how AI assistants like ChatGPT, Claude, and Gemini work. For example: Removing 90% of weights from a model while retaining 95% of accuracy Understanding this helps you use AI tools more effectively.

Related Use Cases

Best AI for Coding

Best AI for Writing

AI Models Using This Concept

Claude

ChatGPT

Gemini

See Pruning in Action

Council lets you compare responses from multiple AI models side-by-side. Experience different approaches to the same prompt instantly.

Browse AI Glossary

Large Language Model (LLM)Prompt Engineering AI Hallucination Context Window Token (AI)RAG (Retrieval-Augmented Generation)Fine-Tuning Temperature (AI)Multimodal AI AI Agent

Definition

Examples

1Removing 90% of weights from a model while retaining 95% of accuracy

2Structured pruning that removes entire attention heads that contribute little to output quality

3SparseGPT pruning large language models to 50% sparsity in one shot without retraining

4Lottery ticket hypothesis finding small subnetworks within large models that match full performance

Common Questions

What does Pruning mean in simple terms?

Removing unnecessary parameters from a neural network to make it smaller and faster without significant quality loss.

Why is Pruning important for AI users?

Pruning enables AI models to run on less powerful hardware, including phones and laptops. It is one of the key techniques making local AI assistants and on-device processing possible.