Need help?
<- Back

Comments (36)

  • amitport
    This is a great development for KV cache compression. I did notice a missing citation in the related works regarding the core mathematical mechanism, though. The foundational technique of applying a geometric rotation prior to extreme quantization, specifically for managing the high-dimensional geometry and enabling proper bias correction, was introduced in our NeurIPS 2021 paper, "DRIVE" (https://proceedings.neurips.cc/paper/2021/hash/0397758f8990c...). We used this exact rotational approach and a similar bias correction mechanism to achieve optimal distributed mean estimation. I also presented this work and subsequent papers in a private invited talk at Google shortly after publication. Given the strong theoretical overlap with the mechanisms in TurboQuant and PolarQuant, I hope to see this prior art acknowledged in the upcoming camera-ready versions.
  • benob
    This is the worst lay-people explanation of an AI component I have seen in a long time. It doesn't even seem AI generated.
  • zeeshana07x
    The gap between how this is described in the paper vs the blog post is pretty wide. Would be nice to see more accessible writing from research teams — not everyone reading is a ML engineer
  • bluequbit
    I did not understand what polarQuant is.Is is something like pattern based compression where the algorithm finds repeating patterns and creates an index of those common symbols or numbers?
  • moktonar
    Aren’t polar coordinates still n-1 + 1 for radius for n-dim vector? If so I understand that angles can be quantized better but when radius r is big the error is large for highly quantized angles right? What am I missing?
  • maurelius2
    I'm somewhat at a loss here other than understanding the fundamentals. Can someone tell me how the compression impact performance?
  • lucrbvi
    Sounds like Multi-Head Latent Attention (MLA) from DeepSeek
  • aledevv
    [dead]
  • veunes
    [dead]
  • dev_tools_lab
    [dead]
  • rsmtjohn
    [dead]
  • mohsen1
    [dead]
  • hikaru_ai
    [dead]
  • mskkm
    Pied Piper vibes. As far as I can tell, this algorithm is hardly compatible with modern GPU architectures. My guess is that’s why the paper reports accuracy-vs-space, but conveniently avoids reporting inference wall-clock time. The baseline numbers also look seriously underreported. “several orders of magnitude” speedups for vector search? Really? anyone has actually reproduced these results?