Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
246 views
in Technique[技术] by (71.8m points)

python - Determining Pandas Column DataType

Sometimes when data is imported to Pandas Dataframe, it always imports as type object. This is fine and well for doing most operations, but I am trying to create a custom export function, and my question is this:

  • Is there a way to force Pandas to infer the data types of the input data?
  • If not, is there a way after the data is loaded to infer the data types somehow?

I know I can tell Pandas that this is of type int, str, etc.. but I don't want to do that, I was hoping pandas could be smart enough to know all the data types when a user imports or adds a column.

EDIT - example of import

a = ['a']
col = ['somename']
df = pd.DataFrame(a, columns=col)
print(df.dtypes)
>>> somename    object
dtype: object

The type should be string?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

This is only a partial answer, but you can get frequency counts of the data type of the elements in a variable over the entire DataFrame as follows:

dtypeCount =[df.iloc[:,i].apply(type).value_counts() for i in range(df.shape[1])]

This returns

dtypeCount

[<class 'numpy.int32'>    4
 Name: a, dtype: int64,
 <class 'int'>    2
 <class 'str'>    2
 Name: b, dtype: int64,
 <class 'numpy.int32'>    4
 Name: c, dtype: int64]

It doesn't print this nicely, but you can pull out information for any variable by location:

dtypeCount[1]

<class 'int'>    2
<class 'str'>    2
Name: b, dtype: int64

which should get you started in finding what data types are causing the issue and how many of them there are.

You can then inspect the rows that have a str object in the second variable using

df[df.iloc[:,1].map(lambda x: type(x) == str)]

   a  b  c
1  1  n  4
3  3  g  6

data

df = DataFrame({'a': range(4),
                'b': [6, 'n', 7, 'g'],
                'c': range(3, 7)})

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...