python - Pandas: TypeError: '>' not supported between instances of 'int' and 'str' when selecting on date column

Question

Welcome To Ask or Share your Answers For Others

python - Pandas: TypeError: '>' not supported between instances of 'int' and 'str' when selecting on date column

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Pandas: TypeError: '>' not supported between instances of 'int' and 'str' when selecting on date column

I have a Pandas DataFrame with a column with TimeStamps. I can select date ranges from this column. But after I make change to other columns in the DataFrame, I can no longer and I get the error "TypeError: '>' not supported between instances of 'int' and 'str'".

The code below reproduce the problem:

Generate a DataFrame with some random numbers
Add a column with dates

Select on the date column

df = pd.DataFrame(np.random.random((200,3)))
df['date'] = pd.date_range('2000-1-1', periods=200, freq='D')
mask = (df['date'] > '2000-6-1') & (df['date'] <= '2000-6-10')
print(df.loc[mask])

All good:

            0         1         2       date
153  0.280575  0.810817  0.534509 2000-06-02
154  0.490319  0.873906  0.465698 2000-06-03
155  0.070790  0.898340  0.390777 2000-06-04
156  0.896007  0.824134  0.134484 2000-06-05
157  0.539633  0.814883  0.976257 2000-06-06
158  0.772454  0.420732  0.499719 2000-06-07
159  0.498020  0.495946  0.546043 2000-06-08
160  0.562385  0.460190  0.480170 2000-06-09
161  0.924412  0.611929  0.459360 2000-06-10

However, now I set column 0 to 0 if it exceeds 0.7 and repeat:

df[df[0] > 0.7] = 0
mask = (df['date'] > '2000-6-1') & (df['date'] <= '2000-6-10')

This gives the error:

TypeError: '>' not supported between instances of 'int' and 'str'

Why does this happen and how do I avoid it?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T20:00:54+0000

You can compare a timestamp (Timestamp('2000-01-01 00:00:00')) to a string, pandas will convert the string to Timestamp for you. But once you set the value to 0, you cannot compare an int to a str.

Another way to go around this is to change order of your operations.

filters = df[0] > 0.7
mask = (df['date'] > '2000-6-1') & (df['date'] <= '2000-6-10')

df[filters] = 0
print(df.loc[mask & filters])

Also, you mentioned you want to set column 0 to 0 if it exceeds 0.7, so df[df[0]>0.7] = 0 does not do exactly what you want: it sets the entire rows to 0. Instead:

df.loc[df[0] > 0.7, 0] = 0

Then you should not have any problem with the original mask.

Categories

python - Pandas: TypeError: '>' not supported between instances of 'int' and 'str' when selecting on date column

python - Pandas: TypeError: '>' not supported between instances of 'int' and 'str' when selecting on date column

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags