You guessed right. The MPI standard does not specify how stdout from different nodes should be collected for printing at the originating process. It is often the case that when multiple processes are doing prints the output will get merged in an unspecified way. fflush
doesn't help.
If you want the output ordered in a certain way, the most portable method would be to send the data to the master process for printing.
For example, in pseudocode:
if (rank == 0) {
print_col(0);
for (i = 1; i < comm_size; i++) {
MPI_Recv(buffer, .... i, ...);
print_col(i);
}
} else {
MPI_Send(data, ..., 0, ...);
}
Another method which can sometimes work would be to use barries to lock step processes so that each process prints in turn. This of course depends on the MPI Implementation and how it handles stdout.
for(i = 0; i < comm_size; i++) {
MPI_Barrier(MPI_COMM_WORLD);
if (i == rank) {
printf(...);
}
}
Of course, in production code where the data is too large to print sensibly anyway, data is eventually combine by having each process writing to a separate file and merged separately, or using MPI I/O (defined in the MPI2 standards) to coordinate parallel writes.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…