Quantization

Reduction of numerical precision of model weights (from 32-bit to 8-bit or 4-bit) to use less memory and run faster.

Advanced optimizacion hardware eficiencia

Full definition

Reduction of numerical precision of model weights (from 32-bit to 8-bit or 4-bit) to use less memory and run faster.

Running 70B-parameter models on a single GPU thanks to 4-bit quantization.