AutoGPTQ is an implementation of GPTQ (Quantized GPT) that optimizes large language models (LLMs) for faster inference by reducing their computational footprint while maintaining accuracy.

Features

  • Efficient quantization for large language models
  • Reduces memory usage without major performance loss
  • Supports various precision levels (e.g., 4-bit, 8-bit)
  • Compatible with Hugging Face Transformers
  • Accelerates inference on GPUs and CPUs
  • Helps deploy LLMs on resource-constrained hardware

Project Samples

Project Activity

See All Activity >