AVX-512 Optimization For Linux RAID Showing Up To 41% Improvement On AMD Ryzen 9 9950X

Biggers has written an AVX-512 optimized xor_gen() function for the RAID code. The Linux kernel's xor_gen() function is used for generating and validating parity blocks such as for RAID5/RAID6. He commented with today's patch the details and it targeting AMD Zen 4 and newer, Intel Sapphire Rapids and newer, or on the Intel client side is either Rocket Lake generation or upcoming Nova Lake.
"Add an implementation of xor_gen() using AVX-512.
It uses 512-bit vectors, i.e. ZMM registers. It also uses the vpternlogq instruction to do three-input XORs when applicable.
It's enabled on x86_64 CPUs that have AVX512F && !PREFER_YMM. In practice that means:
- AMD Zen 4 and later (client and server)
- Intel Sapphire Rapids and later (server)
- Intel Rocket Lake (client)
- Intel Nova Lake and later (client)
The !PREFER_YMM condition excludes the older AVX-512 implementations in Intel Skylake Server and Intel Ice Lake. They could run this code, but they're known to have overly-eager downclocking when ZMM registers are used. This is the same policy that the crypto and CRC code uses."
Where it gets really exciting is the improvement out of this AVX-512 implementation. In testing on an AMD Ryzen 9 9950X (Zen 5) desktop processor is between a 19% and 41% improvement:
Pretty damn nice improvement on top of all the other AVX-512 optimizations made by Eric Biggers in recent times. Hopefully this patch will work its way to the mainline kernel in the near future.
11 Comments
