Reading through code that iterates through a directory of text and does preprocessing. I'm not very familiar with the next() method in Python, nor files really (other than doing simple opens and prints).
Here's the code of the method that errors out:
def create_vocab(file_path, prompt_id, vocab_size, tokenize_text, to_lower):
logger.info('Creating vocabulary from: ' + file_path)
total_words, unique_words = 0, 0
word_freqs = {}
with codecs.open(file_path, mode='r', encoding='utf-8') as input_file:
print("file:",input_file)
print("path:",file_path)
next(input_file)
for line in input_file:
tokens = line.strip().split('')
print("line: ",line)
print("tokens:", tokens)
and here's the output:
file: <codecs.StreamReaderWriter object at 0x000002ADB5605EB0>
path: data/fold_0/train.tsv
line:
tokens: ['']
Traceback (most recent call last):
File "attn_network.py", line 154, in <module>
main()
File "attn_network.py", line 74, in main
data_prepare.prepare_sentence_data(
File "C:UsersseansOneDriveDesktopSchoolResearchLitman_Labcodeco-attentionCO_ATTNdata_prepare.py", line 30, in prepare_sentence_data
reader.get_data(
File "C:UsersseansOneDriveDesktopSchoolResearchLitman_Labcodeco-attentionCO_ATTN
eader.py", line 197, in get_data
vocab = create_vocab(train_path, prompt_id, vocab_size, tokenize_text, to_lower)
File "C:UsersseansOneDriveDesktopSchoolResearchLitman_Labcodeco-attentionCO_ATTN
eader.py", line 62, in create_vocab
essay_id = int(tokens[0])
ValueError: invalid literal for int() with base 10: ''
feels like I'm missing something here, and that thing is next(). everything prints except line. Thanks again for the help!
question from:
https://stackoverflow.com/questions/65850957/what-exactly-does-next-do 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…