Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
630 views
in Technique[技术] by (71.8m points)

c++ - Why is iterating 2D array row major faster than column major?

Here is simple C++ code that compare iterating 2D array row major with column major.

#include <iostream>
#include <ctime>

using namespace std;

const int d = 10000;

int** A = new int* [d];

int main(int argc, const char * argv[]) {
    for(int i = 0; i < d; ++i)
        A[i] = new int [d];
    
    clock_t ColMajor = clock();
    
    for(int b = 0; b < d; ++b)
        for(int a = 0; a < d; ++a)
            A[a][b]++;
    
    double col = static_cast<double>(clock() - ColMajor) / CLOCKS_PER_SEC;
    
    clock_t RowMajor = clock();
    for(int a = 0; a < d; ++a)
        for(int b = 0; b < d; ++b)
            A[a][b]++;
    
    double row = static_cast<double>(clock() - RowMajor) / CLOCKS_PER_SEC;
    

    
    cout << "Row Major : " << row;
    cout << "
Column Major : " << col;

    return 0;
}

Result for different values of d:

d = 10^3 :

Row Major : 0.002431

Column Major : 0.017186

d = 10^4 :

Row Major : 0.237995

Column Major : 2.04471

d = 10^5

Row Major : 53.9561

Column Major : 444.339

Now the question is why row major is faster than column major?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

It obviously depends on the machine you're on but very generally speaking:

  1. Your computer stores parts of your program's memory in a cache that has a much smaller latency than main memory (even when compensating for cache hit time).

  2. C arrays are stored in a contiguous by row major order. This means if you ask for element x, then element x+1 is stored in main memory at a location directly following where x is stored.

  3. It's typical for your computer cache to "pre-emptively" fill cache with memory addresses that haven't been used yet, but that are locally close to memory that your program has used already. Think of your computer as saying: "well, you wanted memory at address X so I am going to assume that you will shortly want memory at X+1, therefore I will pre-emptively grab that for you and place it in your cache".

When you enumerate your array via row major order, you're enumerating it in such a way where it's stored in a contiguous manner in memory, and your machine has already taken the liberty of pre-loading those addresses into cache for you because it guessed that you wanted it. Therefore you achieve a higher rate of cache hits. When you're enumerating an array in another non-contiguous manner then your machine likely won't predict the memory access pattern you're applying, so it wont be able to pre-emptively pull memory addresses into cache for you, and you won't incur as many cache hits, so main memory will have to be accessed more frequently which is slower than your cache.

Also, this might be better suited for https://cs.stackexchange.com/ because the way your system cache behaves is implemented in hardware, and spatial locality questions seem better suited there.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...