Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.5k views
in Technique[技术] by (71.8m points)

nlp - PyTorch GPU memory leak during inference

I am trying to encode documents sentence-wise with a huggingface transformer module. I'm using the very small google/bert_uncased_L-2_H-128_A-2 pretrained model with the following code:

def pre_encode_wikipedia(model, tokenizer, device, save_path):
  
  document_data_list = []

  for iteration, document in enumerate(wikipedia_small['text']):
    torch.cuda.empty_cache()

    sentence_embeds_per_doc = [torch.randn(128)]
    attention_mask_per_doc = [1]
    special_tokens_per_doc = [1]

    doc_split = nltk.sent_tokenize(document)
    doc_tokenized = tokenizer.batch_encode_plus(doc_split, padding='longest', truncation=True, max_length=512, return_tensors='pt')

    for key, value in doc_tokenized.items():
      doc_tokenized[key] = doc_tokenized[key].to(device)

    with torch.no_grad():  
      doc_encoded = model(**doc_tokenized)

    for sentence in doc_encoded['last_hidden_state']:
      sentence[0].to('cpu')
      sentence_embeds_per_doc.append(sentence[0])
      attention_mask_per_doc.append(1)
      special_tokens_per_doc.append(0)

    sentence_embeds = torch.stack(sentence_embeds_per_doc)
    attention_mask = torch.FloatTensor(attention_mask_per_doc)
    special_tokens_mask = torch.FloatTensor(special_tokens_per_doc)

    document_data = torch.utils.data.TensorDataset(*[sentence_embeds, attention_mask, special_tokens_mask])
    torch.save(document_data, f'{save_path}{time.strftime("%Y%m%d-%H%M%S")}{iteration}.pt')
    print(f"Document at {iteration} encoded and saved.")

After about 200-300 iterations on my local GTX 1060 3GB I get an error saying that my CUDA memory is full. Running this code on Colab with more GPU RAM gives me a few thousand iterations.

Things I've tried:

  • Adding torch.cuda.empty_cache() to the start of every iteration to clear out previously held tensors
  • Wrapping the model in torch.no_grad() to disable the computation graph
  • Setting model.eval() to disable any stochastic properties that might take up memory
  • Sending the output straight to CPU in hopes to free up memory

I'm baffled as to why my memory keeps overflowing. I've trained several models of bigger sizes, applying all the standard practices of a training loop (optimizer.zero_grad(), etc.) I've never had this problem. Why does it appear during this seemingly trivial task?

Edit #1 Changing sentence[0].to('cpu') to cpu_sentence = sentence[0].to('cpu') gave me a few thousand iterations before VRAM usage suddenly spiked, causing the run to crash:enter image description here

question from:https://stackoverflow.com/questions/65906965/pytorch-gpu-memory-leak-during-inference

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Can you try replacing

sentence[0].to('cpu')

with

cpu_sentence = sentence[0].to('cpu')

See more info here https://pytorch.org/docs/stable/tensors.html#torch.Tensor.to


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...