What is QLoRA (Quantized LoRA)? — AI Glossary 2026

Definition

Combines 4-bit quantization with LoRA fine-tuning, enabling fine-tuning of 65B+ parameter models on a single consumer GPU. Published by Tim Dettmers et al. (2023). Made democratized fine-tuning of large models practical.

Related Terms

LoRA (Low-Rank Adaptation)

A parameter-efficient fine-tuning technique that adds small trainable rank-decomposition matrices to frozen model layers. Achieves near full fine-tuning performance while training less than 1% of parameters. Industry standard for adapting LLMs.

Quantization

Reducing the numerical precision of model weights (e.g., from 32-bit float to 4-bit integer) to shrink model size and speed up inference with minimal accuracy loss. Enables running large models on consumer hardware. Key for local deployments.

Fine-tuning

Continuing to train a pre-trained model on a domain-specific or task-specific dataset to specialize its behavior. Ranges from full fine-tuning (updating all weights) to parameter-efficient methods like LoRA and QLoRA.

Definition

Related Terms

All Terms