Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
471 views
in Technique[技术] by (71.8m points)

python - in Pandas, when using read_csv(), how to assign a NaN to a value that's not the dtype intended?

Note: Please excuse my very low skilled English, feel free to modify the question's title, or the following text to be more understandable

I have this line in my code:

moto = pd.read_csv('reporte.csv')

It sends a DtypeWarning: Columns (2,3,4,5,6,7,8,9,10,12,13) have mixed types. warning, so I change it to

moto = pd.read_csv('reporte.csv', dtype={'TP': np.float64})

Now it drops a ValueError: could not convert string to float: 'None'.

I checked the file (around 200K lines) with Excel, and yes, I found some cells with "None" value.

So my question is: Is there a way to ignore the error, or force python to fill the offending error with NaN or something else?

I tried the solution here but it didn't work.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I tried creating a csv to replicate this feedback but couldn't on pandas 0.18, so I can only recommend two methods to handle this:

First

If you know that your missing values are all marked by a string 'none', then do this:

moto = pd.read_csv("test.csv", na_values=['none'])

You can also add, to the na_values list, other markers that should be converted to NaNs.

Second

Try your first line again without using the dtype option.

moto = pd.read_csv('reporte.csv')

The read is successful because you are only getting a warning. Now execute moto.dtypes to show you which columns are objects. For the ones you want to change do the following:

moto.test_column = pd.to_numeric(moto.test_column, errors='coerce')

The 'coerce' option will convert any problematic entries, like 'none', to NaNs.

To convert the entire dataframe at once, you can use convert_objects. You could also use it on a single column, but that usage is deprecated in favor of to_numeric. The option, convert_numeric, does the coercion to NaNs:

moto = moto.convert_objects(convert_numeric=True)

After any of these methods, proceed with fillna to do what you need to.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...