GlossarIA
Open AI glossary for companies
← Back to glossary

Quantization

Reduction of numerical precision of model weights (from 32-bit to 8-bit or 4-bit) to use less memory and run faster.

Advanced optimizacion hardware eficiencia

Full definition

Reduction of numerical precision of model weights (from 32-bit to 8-bit or 4-bit) to use less memory and run faster.

Example in a business context

Running 70B-parameter models on a single GPU thanks to 4-bit quantization.