5 TurboQuant lessons for vector search teams
5 takeaways on Qdrant TurboQuant: how rotation changes compression, where recall holds up, and when safer quantizers fit better.

TurboQuant can cut vector memory while keeping search quality steadier than simpler quantizers.
This guide turns one Qdrant experiment into five practical lessons, using a 1536-dimension embedding as the memory baseline.
| Item | Compression | Typical tradeoff |
|---|---|---|
| Scalar quantization | 4x | Small recall loss, easy to run |
| Binary quantization | 32x | Very low memory, higher instability |
| TurboQuant 4-bit | 8x | Better geometry preservation than plain low-bit compression |
| TurboQuant 2-bit | 16x | More storage savings, more accuracy risk |
1. What quantization really buys you
Get the latest AI news in your inbox
Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.
No spam. Unsubscribe at any time.
Quantization is not just a storage trick. It changes how much vector data you can keep in memory, which matters fast once embeddings get large. A 1536-dimension float32 vector takes about 6 KB, so one million vectors can consume roughly 6 GB before you even talk about index overhead.

The basic idea is simple: store fewer bits per value, accept some error, and hope retrieval quality stays good enough. Scalar quantization usually maps values into 256 bins and stores them as bytes, which gives about 4x compression. Push harder, and the savings rise while the chance of recall loss rises too.
- Float32: highest fidelity, highest memory use
- Scalar: common default, moderate savings
- Binary: extreme compression, weakest shape preservation
2. Why TurboQuant starts with rotation
TurboQuant changes the order of operations. Instead of compressing the vector as-is, it rotates the vector first so that signal gets spread more evenly across dimensions. That matters because many embeddings carry more useful information in some coordinates than others, and plain quantizers do not account for that unevenness.
The rotation does not change distances by itself. It changes where the information sits, making the vector easier to compress without throwing away as much geometry. In Qdrant’s implementation, this is paired with a precomputed codebook and a scoring correction that helps offset the shrinkage introduced by quantization.
- Rotation spreads energy across dimensions
- Quantization happens after the vector is easier to encode
- Length renormalization helps correct score bias
3. Where TurboQuant beats plain low-bit compression
The strongest case for TurboQuant is not that it uses fewer bits than every other method. The stronger case is that it tends to spend those bits more intelligently. A rotated vector is less lopsided, so a compact code can preserve more useful structure than a direct low-bit mapping of the original coordinates.

That makes TurboQuant appealing when you want a better balance of memory and recall than binary quantization, but do not want the tuning burden of product quantization. Qdrant’s 1.18 release also makes the feature easier to try in an existing collection, which lowers the cost of testing it in production-like settings.
- Good fit: teams that want lower memory without a huge recall drop
- Good fit: workloads where vector geometry matters more than raw compression
- Less ideal: cases that already tolerate very aggressive quality loss
4. What the bit depths mean in practice
TurboQuant is not one setting. Qdrant exposes several bit-depth options, including bits4, bits2, bits1.5, and bits1. Lower bit depth means stronger compression, but it also increases the chance that the encoded vector drifts away from the original one. That is the central tradeoff in the article’s experiments.
For teams deciding where to start, 4-bit is the safest first test. It usually gives a meaningful space reduction while keeping the result closer to the original geometry than the more aggressive options. From there, you can step down only if your recall metrics still hold.
- bits4: best first trial for most teams
- bits2: useful when memory pressure is stronger
- bits1.5 and bits1: only for very tight storage budgets
client.create_collection(
collection_name="my_collection",
vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE),
quantization_config=models.TurboQuantization(
turbo=models.TurboQuantQuantizationConfig(
bits=models.TurboQuantBitSize.BITS4,
always_ram=True,
)
),
)5. What the benchmark question should be
The right benchmark is not “Which method compresses the most?” It is “Which method keeps recall stable enough for my data and query pattern?” That is why the article compares TurboQuant with scalar and binary quantization across multiple dataset sizes rather than treating one result as universal.
If your vectors are small in number or your quality bar is strict, a conservative quantizer may still be the better default. If your index is growing fast and you need more room in memory, TurboQuant is worth testing before you jump to harsher compression. The point is not to pick the most advanced option, but the option that keeps your search behavior predictable.
- Benchmark recall at your own scale, not just on toy data
- Check whether score bias changes ranking behavior
- Compare memory savings against latency and quality together
How to decide
Pick scalar quantization if you want a simple, familiar default with mild compression. Pick binary only if memory pressure is extreme and you can tolerate a larger quality hit. Pick TurboQuant when you want a middle path: stronger compression than scalar, but less instability than the most aggressive low-bit methods.
If you are unsure, start with TurboQuant 4-bit on one collection, measure recall on your real queries, and only move lower if the numbers stay acceptable. That is the safest way to see whether it is a fit for your own vector search system.
// Related Articles
- [IND]
OpenAI’s IPO filing turns hype into scrutiny
- [IND]
Skatteetaten proves public sector AI should be judged by outcomes
- [IND]
OpenAI’s IPO filing puts AI’s biggest test on Wall Street
- [IND]
OpenAI’s latest moves now center on pricing, safety, and scale
- [IND]
RISC-V mini PCs are worth buying now, but only as a bet on the future
- [IND]
Fedora 44 RISC-V widens Linux board support