I'm appending multiple text files one one single data frame. For some reason, which I don't totally understand, the column names change slightly over time, but they are really the same thing. Here's an example.
['ACCEPTANCES_EXECUTED_FOR_ACCT____OUT',
'ACCEPTANCES_EXECUTED_FOR_ACCT____OUT_1',
'ACCUMULATED_OTH_COMPREHENSIVE_INCOME',
'ACCUMULATED_OTH_COMPREHENSIVE_INCOME_1',
'ALLL_AMT',
'ALLL_AMT_1',
'AUDIT_INDICATOR',
'AUDIT_INDICATOR_1',
'AVAILABLE_FOR_SALE_SECURITIES',
'AVAILABLE_FOR_SALE_SECURITIES_1',
'COMMON_STOCK',
'COMMON_STOCK_1',
file]
I know that 'ACCEPTANCES_EXECUTED_FOR_ACCT____OUT'
and 'ACCEPTANCES_EXECUTED_FOR_ACCT____OUT_1'
are the same. Similarly, 'ACCUMULATED_OTH_COMPREHENSIVE_INCOME'
and 'ACCUMULATED_OTH_COMPREHENSIVE_INCOME_1'
are the same.
Is there a way to get the field named 'ACCEPTANCES_EXECUTED_FOR_ACCT____OUT_1'
appended under the field named 'ACCEPTANCES_EXECUTED_FOR_ACCT____OUT'
and then drop the 'ACCEPTANCES_EXECUTED_FOR_ACCT____OUT_1'
?
Or, is it possible that there is a problem with my append? I think it's just a standard append process.
try:
df = pd.DataFrame()
df = pd.read_csv(f, delimiter='', skiprows=1)
df['file'] = os.path.basename(f)
all_df[x].append(df)
except:
print(f + ' seems to have some bad data points. please check and confirm!')
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…