So I have this question from my professor, and I can not figure out why vector2
is faster and has less cache misses than vector1
.
Assume that the code below is a valid compilable C code.
Vector2:
void incrementVector2(INT4* v, int n) {
for (int k = 0; k < 100; ++k) {
for (int i = 0; i < n; ++i) {
v[i] = v[i] + 1;
}
}
}
Vector1:
void incrementVector1(INT4* v, int n) {
for (int i = 0; i < n; ++i) {
for (int k = 0; k < 100; ++k) {
v[i] = v[i] + 1;
}
}
}
NOTE: INT4 means the integer is 4 Bytes in size.
In terms of CPU specs: cache size = 12288KB, line size=64B and only considering this single cache interacting with main memory.
Question
Why does vector2
have a faster runtime than vector1
? And why vector1
will have more cache misses than vector2
?
Me and a few classmates worked on this for a while and couldn't figure it out. We thought it could be due to the fact that vector2
has better spatial locatlity, since the array is been accessed by the inner loop constantly, while in vector1
, only one element is accessed at a time?
we are not sure if this is correct, and also not sure how to bring cache lines in to this either.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…