I am doing a sort_by_key
with key-value int arrays of size 80 million.
The device is a GTX 560 Ti with 2GB VRAM. When the available (free)
memory before the sort_by_key is 1200MB
, it finishes sorting in 200ms
.
But, when the available memory drops to 600MB
, the sort_by_key for the
same key-value arrays takes 1.5-3s
!
I ran the program under Compute Visual Profiler. I found that the GPU
timestamp jumps by 1.5-3s between the last kernel before sort_by_key
and the first kernel call inside sort_by_key
(which is a
RakingReduction
).
I suspect there is a memory allocation being done inside sort_by_key
,
before it calls its first internal kernel. The memory that sort_by_key
needs is available (even when available memory is 600MB
) since the
sort_by_key
works, even though it is slower. I see that the computer
freezes for 1s when this happens. I also see a bump in the CPU
Physical Memory graph if I keep Process Explorer open.
Is there anything I can do to make this sort_by_key
work just as fast
when available memory is lesser? Also, what is happening between the
device and host that is causing the memory bump and temporary
freezing?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…