From what? Automatic vectorization is very limited, it’s only good for pure vertical algorithms.
> and dynamically choose depending on CPU capabilities
Only Intel compiler does that, the rest of them don’t.
Sometimes I ship multiple versions of binaries. Other times I do runtime dispatch myself, it’s only couple lines of code, __cpuid() then cache some function pointers, or abstract class pointers. It’s even possible to compile these implementations from the same C++ source, using templates, macros, and/or something else.
From what? Automatic vectorization is very limited, it’s only good for pure vertical algorithms.
> and dynamically choose depending on CPU capabilities
Only Intel compiler does that, the rest of them don’t.
Sometimes I ship multiple versions of binaries. Other times I do runtime dispatch myself, it’s only couple lines of code, __cpuid() then cache some function pointers, or abstract class pointers. It’s even possible to compile these implementations from the same C++ source, using templates, macros, and/or something else.