256b wide SVE vectors allow some simplification of truffle.
Up to 40% speedup on graviton3. Going from 12500 MB/s to 17000 MB/s
onhe microbenchmark.
SVE2 also offer this capability for 128b vector with a speedup around
25% compared to normal SVE
Add unit tests and benchmark for this wide variant
Signed-off-by: Yoan Picchi <yoan.picchi@arm.com>
Add the new Tamarama engine that acts as a container for infix/suffix
engines that can be proven to run exclusively of one another.
This reduces stream state for pattern sets with many exclusive engines.