Worth mentioning that MeshOptimizer (<a href="https:&#x2F;&#x2F;github.com&#x2F;zeux&#x2F;meshoptimizer" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;zeux&#x2F;meshoptimizer</a>) has become one of a handful &#x27;hidden champion&#x27; pillar libraries that probably carries half of the gaming industry.Basically the curl of asset pipelines ;)<a href="https:&#x2F;&#x2F;github.com&#x2F;zeux&#x2F;meshoptimizer&#x2F;discussions&#x2F;986" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;zeux&#x2F;meshoptimizer&#x2F;discussions&#x2F;986</a>

While we could utilize zigzag encoding (i&gt;&gt;31) ^ (i&lt;&lt;1) to convert SLEB128-encoded type&#x2F;addend to use ULEB128 instead, the generate code is inferior to or on par with SLEB128 for one-byte encodings on x86, AArch64, and RISC-V. Haven&#x27;t tried wider values - but zigzag encoding is likely slower as well&#x2F;&#x2F; One-byte case for SLEB128
int64_t from_signext(uint64_t v) {
return v &lt; 64 ? v - 128 : v;
}&#x2F;&#x2F; One-byte case for ULEB128 with zig-zag encoding
int64_t from_zigzag(uint64_t z) {
return (z &gt;&gt; 1) ^ -(z &amp; 1);
}

Is the matrix for bit shifting upside down or am I momentarily making a really dumb mistake here? Edit: nvm I missed the footnote which clarifies that for whatever reason the instruction populates the matrix from bottom to top.

This sort of analysis is great.Now why can&#x27;t compilers do this sort of thing automatically?Almost any problem seems to be possible to speed up 1000x in AVX512+days of thought compared to the naive version written in a python loop. If we could automate that whole process for big codebases the performance gains could be huge.

HN

Zigzag Decoding with AVX-512

Comments (18)