Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
2.8k views
in Technique[技术] by (71.8m points)

python - Convert arrays stored in a pandas columns to a new dataframe columns , append and map the resulting array values to the original dataframe

Python newbie here- I am comfortable with pandas (however, I know next to nothing when it comes to arrays).

I have a dataframe df with four columns namely: Name, colA, Income, colB

  • My primary goal is to carry out data analysis on this dataset, however the challenge I am having is the presence of arrays in two columns of my dataset (see below- colA and colB).

  • The key thing to know- colA and colB are features extracted from image data.

I want to do the following :

  1. convert the arrays in colA and colB to regular columns as seen in Name and Income
  2. Map the converted arrays(the new columns) back to the corresponding row/index number in the original dataframe
  3. Assign column names to the new columns of the mapped arrays such as colA1, colA2, colA3,colB1,colB2,colB3,colB4...... (so that one will be able to know where the new columns were derived from)
df

index, Name,  colA                           ,Income,     colB 
1. Peter,  [[[3,4],[3,9],[3,0],[2,1]]]      , 32100,   [[3,4,1,3,1],[1,2,2,2,1],[6,5,0,1,1],[1,2,1,1,1]]
2. John ,  [[[1,2],[3,5],[1,0],[0,1]]]      , 43256,   [[5,4,2,3,4],[5,1,2,2,5],[7,5,0,1,2],[4,2,1,1,3]]
3. Mark ,  [[[5,8],[5,9],[1,0],[1,4]]]      , 29811,   [[4,4,1,3,2],[6,2,2,2,8],[6,1,0,1,3],[9,2,1,9,9]]
4. Jane ,  [[[8,4],[1,2],[5,3],[1,8]]]      ,134500,   [[3,4,7,3,7],[1,2,5,6,2],[6,5,1,3,2],[9,2,3,2,5]]
5. Jill ,  [[[6,6],[2,1],[1,1],[5,6]]]      ,233120,   [[5,4,5,3,9],[1,2,5,2,0],[0,5,0,4,2],[1,5,1,6,1]]

Desired output :

# the new df  should look something like the example below or  something more appropriate for data analysis in a pandas dataframe 

index Name, Income  colA1, colA2, colA3,colA4,colA5,ColB1,colB2,colB3,colB4                              
1. Peter, 32100,  3,4,3,9,3,0,2,1 3,4,1,3,1,1,2,2,2,1,6,5,0,1,1,1,2,1,1,1
2. John , 43256,  1,2,3,5,1,0,0,1,5,4,2,3,4,5,1,2,2,5,7,5,0,1,2,4,2,1,1,3
3. Mark , 29811   5,8,5,9,1,0,1,4 4,4,1,3,2,6,2,2,2,8,6,1,0,1,3,9,2,1,9,9
4. Jane , 134500,  8,4,1,2,5,3,1,8,3,4,7,3,7,1,2,5,6,2,6,5,1,3,2,9,2,3,2,5
5. Jill , 233120, 6,6,2,1,1,1,5,6,5,4,5,3,9,1,2,5,2,0,0,5,0,4,2,1,5,1,6,1

Unfortunately, I don't have a trial code because I don't know my way with arrays. Thanks for your attempt.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Step 1: Flatten the lists inside the columns

df['colA']=df['colA'].apply(lambda x: [item for sublist in x[0] for item in sublist])

0    [3, 4, 3, 9, 3, 0, 2, 1]
1    [1, 2, 3, 5, 1, 0, 0, 1]

df['colB']=df['colB'].apply(lambda x: [item for sublist in x for item in sublist])

0    [3, 4, 1, 3, 1, 1, 2, 2, 2, 1, 6, 5, 0, 1, 1, ...
1    [5, 4, 2, 3, 4, 5, 1, 2, 2, 5, 7, 5, 0, 1, 2, ...

Step 2: Create Columns from lists with a prefix, concat them into our dataframe and drop the original cols:

 df=pd.concat([pd.DataFrame(df.colA.tolist(), index= df.index).add_prefix('colA'),df,],axis=1).drop('colA',axis=1)

 df=pd.concat([pd.DataFrame(df.colB.tolist(), index= df.index).add_prefix('colB'),df,],axis=1).drop('colB',axis=1)



   colB0  colB1  colB2  colB3  colB4  colB5  colB6  colB7  colB8  colB9  ...  
0      3      4      1      3      1      1      2      2      2      1  ...   
1      5      4      2      3      4      5      1      2      2      5  ...   

   colA0  colA1  colA2  colA3  colA4  colA5  colA6  colA7   Name  Income  
0      3      4      3      9      3      0      2      1  Peter   32100  
1      1      2      3      5      1      0      0      1   John   43256 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...