Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
270 views
in Technique[技术] by (71.8m points)

python - Can we append similar named columns onto one column?

I'm appending multiple text files one one single data frame. For some reason, which I don't totally understand, the column names change slightly over time, but they are really the same thing. Here's an example.

['ACCEPTANCES_EXECUTED_FOR_ACCT____OUT',
 'ACCEPTANCES_EXECUTED_FOR_ACCT____OUT_1',
 'ACCUMULATED_OTH_COMPREHENSIVE_INCOME',
 'ACCUMULATED_OTH_COMPREHENSIVE_INCOME_1',
 'ALLL_AMT',
 'ALLL_AMT_1',
 'AUDIT_INDICATOR',
 'AUDIT_INDICATOR_1',
 'AVAILABLE_FOR_SALE_SECURITIES',
 'AVAILABLE_FOR_SALE_SECURITIES_1',
 'COMMON_STOCK',
 'COMMON_STOCK_1',
 file]

I know that 'ACCEPTANCES_EXECUTED_FOR_ACCT____OUT' and 'ACCEPTANCES_EXECUTED_FOR_ACCT____OUT_1' are the same. Similarly, 'ACCUMULATED_OTH_COMPREHENSIVE_INCOME' and 'ACCUMULATED_OTH_COMPREHENSIVE_INCOME_1' are the same.

Is there a way to get the field named 'ACCEPTANCES_EXECUTED_FOR_ACCT____OUT_1' appended under the field named 'ACCEPTANCES_EXECUTED_FOR_ACCT____OUT' and then drop the 'ACCEPTANCES_EXECUTED_FOR_ACCT____OUT_1'?

Or, is it possible that there is a problem with my append? I think it's just a standard append process.

try:
    df = pd.DataFrame()
    df = pd.read_csv(f, delimiter='', skiprows=1) 
    df['file'] = os.path.basename(f)
    all_df[x].append(df) 
except:
    print(f + ' seems to have some bad data points. please check and confirm!')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Here are two potential solutions: First, if the column order is always identical, you can set the column names before stacking. df.columns = all_df.columns.

Second, if the change is always as simple as "_1", and if that substring doesn't appear in any other places, you could sub that out of the column name with a .replace('_1', ''). Something along the lines of df.columns = [x.replace('_1', '') for x in df.columns].

Beyond that, you'd probably have to do something fancier with fuzzy string matching.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...