I am trying to encode documents sentence-wise with a huggingface transformer module. I'm using the very small google/bert_uncased_L-2_H-128_A-2
pretrained model with the following code:
def pre_encode_wikipedia(model, tokenizer, device, save_path):
document_data_list = []
for iteration, document in enumerate(wikipedia_small['text']):
torch.cuda.empty_cache()
sentence_embeds_per_doc = [torch.randn(128)]
attention_mask_per_doc = [1]
special_tokens_per_doc = [1]
doc_split = nltk.sent_tokenize(document)
doc_tokenized = tokenizer.batch_encode_plus(doc_split, padding='longest', truncation=True, max_length=512, return_tensors='pt')
for key, value in doc_tokenized.items():
doc_tokenized[key] = doc_tokenized[key].to(device)
with torch.no_grad():
doc_encoded = model(**doc_tokenized)
for sentence in doc_encoded['last_hidden_state']:
sentence[0].to('cpu')
sentence_embeds_per_doc.append(sentence[0])
attention_mask_per_doc.append(1)
special_tokens_per_doc.append(0)
sentence_embeds = torch.stack(sentence_embeds_per_doc)
attention_mask = torch.FloatTensor(attention_mask_per_doc)
special_tokens_mask = torch.FloatTensor(special_tokens_per_doc)
document_data = torch.utils.data.TensorDataset(*[sentence_embeds, attention_mask, special_tokens_mask])
torch.save(document_data, f'{save_path}{time.strftime("%Y%m%d-%H%M%S")}{iteration}.pt')
print(f"Document at {iteration} encoded and saved.")
After about 200-300 iterations on my local GTX 1060 3GB I get an error saying that my CUDA memory is full. Running this code on Colab with more GPU RAM gives me a few thousand iterations.
Things I've tried:
- Adding
torch.cuda.empty_cache()
to the start of every iteration to clear out previously held tensors
- Wrapping the model in
torch.no_grad()
to disable the computation graph
- Setting
model.eval()
to disable any stochastic properties that might take up memory
- Sending the output straight to CPU in hopes to free up memory
I'm baffled as to why my memory keeps overflowing. I've trained several models of bigger sizes, applying all the standard practices of a training loop (optimizer.zero_grad()
, etc.) I've never had this problem. Why does it appear during this seemingly trivial task?
Edit #1
Changing sentence[0].to('cpu')
to cpu_sentence = sentence[0].to('cpu')
gave me a few thousand iterations before VRAM usage suddenly spiked, causing the run to crash:
question from:
https://stackoverflow.com/questions/65906965/pytorch-gpu-memory-leak-during-inference 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…