Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
588 views
in Technique[技术] by (71.8m points)

c++ - Auto vectorization not working

I'm trying to get my code to auto vectorize, but it isn't working.

int _tmain(int argc, _TCHAR* argv[])
{
    const int N = 4096;
    float x[N];
    float y[N];
    float sum = 0;

    //create random values for x and y 
    for (int i = 0; i < N; i++)
    {
        x[i] = rand() >> 1;
        y[i] = rand() >> 1;
    }

    for (int i = 0; i < N; i++){
        sum += x[i] * y[i];
    }
}

Neither loop vectorizes here, but I'm really only interested in the second loop.

I'm using visual studio express 2013 and am compiling with the /O2 and /Qvec-report:2(To report whether or not the loop was vectorized) options. When I compile, I get the following message:

--- Analyzing function: main
c:users...documentsvisual studio 2013projectsintrin3intrin3intrin3.cpp(28) : info C5002: loop not vectorized due to reason '1200'
c:users...documentsvisual studio 2013projectsintrin3intrin3intrin3.cpp(41) : info C5002: loop not vectorized due to reason '1305'

Reason '1305', as can be seen HERE, says that "the compiler can't discern proper vectorizable type information for this loop." I'm not really sure what this means. Any ideas?

After splitting the second loop into two loops:

for (int i = 0; i < N; i++){
    sumarray[i] = x[i] * y[i];
}

for (int i = 0; i < N; i++){
    sum += sumarray[i];
}

Now the first of the above loops vectorizes, but the second one does not, again with error code 1305.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The error 1305 happens because the optimizer did not vectorize the loop since the value sum is not used. Simply adding printf("%d ", sum) fixes that. But then you get a new error code 1105 "Loop includes a non-recognized reduction operation". To fix this you need you need to set /fp:fast

The reason is that floating point arithmetic is not associative and reductions using SIMD or MIMD (i.e. using multiple threads) need to be associative. By using a looser floating point model you can do the reduction.

I just tested it with the following code and the default fp:precise does not vectorize and when I use fp:fast it does.

#include <stdio.h>
int main() {
    const int N = 4096;
    float x[N];
    float y[N];
    float sum = 0;
    for (int i = 0; i < N; i++){
        sum += x[i] * y[i];
    }
    printf("sum %f
", sum);
}

In regards to your question about the loop with the rand() function the rand() function is not a SIMD function. It can't be vectorized. You need to find a SIMD rand() function. I don't know of one. An alternative is pre-compute an array of random numbers and use the array instead. In any case rand() is a horrible random number generate and is only useful for some toy cases. Consider using the Mersenne twister PRNG.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...