Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
367 views
in Technique[技术] by (71.8m points)

python - Different behavior of tensors: debugging vs normal execution on gpu

So I have a program (that I didn't write) running on a GPU that is being parallelized with CUDA v10.2. Now there's the following block of code:

1  ctrs = []
2  for icls, cls_id in enumerate(pred_cls_ids):
3      cls_msk = (mask == cls_id)
4      ms = MeanShiftTorch(bandwidth=radius)
5      ctr, ctr_labels = ms.fit(pred_ctr[cls_msk, :])
6      ctrs.append(ctr.detach().contiguous().cpu().numpy())
7      ctrs = torch.from_numpy(np.array(ctrs).astype(np.float32)).cuda()
8  print(ctrs.size())
9  n_ctrs, _ = ctrs.size()

When running this I get the following error:

  File "/media/datasets/benjamin/repos/PVN3D/pvn3d/lib/utils/pvn3d_eval_utils.py", line 60, in cal_frame_poses
    n_ctrs, _ = ctrs.size()
ValueError: not enough values to unpack (expected 2, got 1)

which makes sense with the output of the print statement in line 8 being:

torch.Size([0])

A few resources on the internet talking about tensors and tensor.shape/tensor.size() suggest casting the tensor.size() to a list, meaning in my case I do the following:

8  print(list(ctrs.size()))
9  n_ctrs, _ = list(ctrs.size())

This produces the same error though:

File "/media/datasets/benjamin/repos/PVN3D/pvn3d/lib/utils/pvn3d_eval_utils.py", line 60, in cal_frame_poses
    n_ctrs, _ = list(ctrs.size())
ValueError: not enough values to unpack (expected 2, got 1)

with the output of line 8 now being

[0]

If I understand that right the problem is not the type of my output, but I need to "assemble" my parallelized tensors to get the actual size of ctrs. I come to this conclusion, because I can run this program while debugging without any value errors of that sort. the output of the print statement while debugging is (for example):

[2, 3]

Is there any change that I can make to that code to get the correct size of the data without doing changes that potentially alter behavior of the code coming after? Something like putting together those parallelized tensors for one line of code to get the size and then immediately parallelizing them again.

question from:https://stackoverflow.com/questions/66063723/different-behavior-of-tensors-debugging-vs-normal-execution-on-gpu

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...