[2408.00278] High Performance Im2win and Direct Convolutions using Three Tensor Layouts on SIMD Architectures