I'm using Tensorflow CsvDataset for reading data from the disk during training.
def preprocess(*fields):
print(len(fields))
features=tf.stack(fields[:-1])
labels=tf.stack([int(x) for x in fields[-1:]])
return features,labels # x, y
training_csvs = sorted(str(p) for p in pathlib.Path('./../Dataset/Train').glob("*/*.csv"))
training_dataset=tf.data.experimental.CsvDataset(
training_csvs,
record_defaults=defaults,
compression_type=None,
buffer_size=None,
header=True,
field_delim=',',
use_quote_delim=True,
na_value="",
select_cols=selected_indices
)
training_dataset = training_dataset.map(preprocess)
training_dataset= training_dataset.shuffle(50000)
validate_ds = training_dataset.batch(50).take(100)
train_ds = training_dataset.batch(50, drop_remainder=True).skip(100)
for f,l in train_ds.take(1): # Here it throws error for one of three datasets
print(f)
print(l)
My data reading code is working for two datasets but throws error for a third dataset as:
InvalidArgumentError: Expect 712 fields but have 711 in record [Op:IteratorGetNext]
As per my understading, some of my csv files have corrupted data, but how to debug the data_iterator to get the name/folder name of those files?
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…