Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
370 views
in Technique[技术] by (71.8m points)

python - Convert 3d pandas DataFrame to Numpy ndarray

I've got a dataframe like

xs = pd.DataFrame({
    'batch1': {
        'timestep1': [1, 2, 3],
        'timestep2': [3, 2, 1]
    }
}).T

DataFrame where each cell is a list

and I want to convert it into a numpy array of shape (batch,timestep,feature). For xs that should be (1,2,3).

The issue is panda only knows the 2D shape, so to_numpy produces a 2D shape.

xs.to_numpy().shape  # (1, 2)

Similarly, this prevents using np.reshape because numpy doesn't seem to see the innermost dimension as an array

xs.to_numpy().reshape((1,2,3))  # ValueError: cannot reshape array of size 2 into shape (1,2,3)

[Edit] Add context on how the dataframe arrived in this state.

The dataframe originally started as

xs = pd.DataFrame({
    ('batch1','timestep1'): {
            'feature1': 1,
            'feature2': 2,
            'feature3': 3
        },
    ('batch1', 'timestep2'): {
            'feature1': 3,
            'feature2': 2,
            'feature3': 1
        }
    }
).T

MultiIndex dataframe

which I decomposed into the nested list/array using

xs.apply(pd.DataFrame.to_numpy, axis=1).unstack()

Unstacked dataframe

question from:https://stackoverflow.com/questions/66048520/convert-3d-pandas-dataframe-to-numpy-ndarray

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
import pandas as pd

xs = pd.DataFrame({
    'batch1': {
        'timestep1': [1, 2, 3],
        'timestep2': [3, 2, 1]
    }
}).T

xs = pd.concat((xs.explode('timestep1').drop('timestep2', axis=1), xs.explode('timestep2').drop('timestep1', axis=1)), axis=1)
print(xs, '
')

n = xs.to_numpy().reshape(1, 2, 3)
print(n)

Output:

       timestep1 timestep2
batch1         1         3
batch1         2         2
batch1         3         1 

[[[1 3 2]
  [2 3 1]]]

EDIT

Starting from your original data frame you can do:

xs = pd.DataFrame({
    ('batch1','timestep1'): {
            'feature1': 1,
            'feature2': 2,
            'feature3': 3
        },
    ('batch1', 'timestep2'): {
            'feature1': 3,
            'feature2': 2,
            'feature3': 1
        },
    ('batch2','timestep1'): {
            'feature1': 4,
            'feature2': 5,
            'feature3': 6
        },
    ('batch2', 'timestep2'): {
            'feature1': 7,
            'feature2': 8,
            'feature3': 9
        }
    }
).T


array = xs.to_numpy().reshape(2,2,3)
print(array)

Output:

[[[1 2 3]
  [3 2 1]]

 [[4 5 6]
  [7 8 9]]]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...