← Back to glossary
Quantization
Reduction of numerical precision of model weights (from 32-bit to 8-bit or 4-bit) to use less memory and run faster.
Advanced optimizacion hardware eficiencia
Full definition
Reduction of numerical precision of model weights (from 32-bit to 8-bit or 4-bit) to use less memory and run faster.
Example in a business context
Running 70B-parameter models on a single GPU thanks to 4-bit quantization.