I have two dataframes with these indices
Index(['RCFD1410', 'RCFD1420', 'RCFD1460', 'RCFD1563', 'RCFD1590', 'RCFD1616',
'RCFD1616', 'RCFD1763', 'RCFD1764', 'RCFD1797',
...
'Unnamed: 163_level_0', 'Unnamed: 195_level_0', 'Unnamed: 197_level_0',
'Unnamed: 204_level_0', 'Unnamed: 79_level_0', 'Unnamed: 81_level_0',
'Unnamed: 85_level_0', 'Unnamed: 92_level_0', 'IDRSSD', 'qyear'],
dtype='object', length=240)
df_list.columns
Index(['IDRSSD', 'Line_Item'], dtype='object')
I am trying to select a subset of columns, so I don't get duplicating column names.
cols_to_use = df_list.columns.difference(grouped_and_summed.columns)
df_merged = pd.merge(grouped_and_summed, df_list[cols_to_use], on='IDRSSD')
This gives me an error: KeyError: 'IDRSSD'
That seems weird, because I thought the index names would match, based on 'IDRSSD' in both dataframes.
Now, if I do this.
cols_to_use = df_list.columns.difference(grouped_and_summed.columns)
df_merged = pd.merge(grouped_and_summed, df_list[cols_to_use], left_index=True, right_index=True, how='inner')
I get df_merged, but the size of df_merged is [239 rows x 241 columns]. The size of grouped_and_summed is [523454 rows x 240 columns]. Over 500k rows are chopped off!
Finally, if I do this.
cols_to_use = df_list.columns.difference(grouped_and_summed.columns)
df_merged = pd.merge(grouped_and_summed, df_list[cols_to_use], left_index=True, right_index=True, how='left')
I get a result like this.
IDRSSD qyear Line_Item
0 1000052 2001Q1 NaN
1 1000052 2001Q2 REAL ESTATE LOANS, TOTAL
2 1000052 2001Q3 R.E. LOANS-FARMLAND
3 1000052 2001Q4 R.E. LOANS MULTI-FAMILY
4 1000052 2002Q1 OTHER LOANS
Shouldn't Line_Item be repeating if IDRSSD is repeating?! My question is, how can I match Line_Item to IDRSSD? There may be a simple fix for this, but I'm just not seeing it here.
question from:
https://stackoverflow.com/questions/65849445/cant-seem-to-merge-corectly-on-index