I am looping through files and appending some files to one data frame, then appending some other files to another data frame, and then appending some other files to yet another data frame. At the end of the append process, and before I save the file, I want to run a check to see if there is any text in the 1st through 10th fields, and delete the entire row if there are non-number in any of these fields. Finally, I want to start checking on row #2, because I actually have 2 headers (row 0 and row 1) in these files. These data frames all have different field names, so I can't identify the field by the name; it has to be the first, second or third field. I googled for a solution, but couldn't find anything that does what I described here. Here is some sample code that I'm using to append together files with similar names.
mylist = [
'FFIEC CDR Call Bulk POR',
'FFIEC CDR Call Schedule CI',
'FFIEC CDR Call Schedule ENT'
]
path = 'C:\Users\ryans\Downloads\'
all_files = glob.glob(os.path.join(path, "*.txt"))
all_df = {
'FFIEC CDR Call Bulk POR': [],
'FFIEC CDR Call Schedule CI': [],
'FFIEC CDR Call Schedule ENT': [],
}
# --- first loop ---
for f in all_files:
for x in mylist:
if x in f:
try:
df = pd.DataFrame()
#print(x)
df = pd.read_csv(f, delimiter='', skiprows=0)
df['file'] = os.path.basename(f)
all_df[x].append(df)
except:
print(f + ' seems to have some bad data points. please check and confirm!')
### I WANT TO CHECK FOR NON-NUMERIC CHARACTERS HERE. I WANT TO DELETE ANY ROWS WITH NON-NUMERIC DATA (STRINGS, DATES, ETC.) BEFORE EXPORTING THE RESULTS TO A CSV FILE.
df_append.to_csv("C:\Users\ryans\OneDrive\Desktop\merged_files\" + x + ".csv")
Here is some sample data. In this scenario, I want to delete row 12.
There will be lots of other rows, below this one, with text.
question from:
https://stackoverflow.com/questions/65648073/how-can-we-delete-every-row-with-text-in-the-first-second-or-third-field 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…