Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
895 views
in Technique[技术] by (71.8m points)

datetime - R: strptime() and is.na () unexpected results

I have a data frame with roughly 8 million rows and 3 columns. I used strptime() in the following manner:

df$date.time <- strptime(df$date.time, "%m/%d/%y %I:%M:%S %p")

This works fine for all but 1104 of the rows, which I checked using

df[is.na(df$date.time), ]

When I look at these "problem" data, the date.time entries seem to be formatted in the way I would expect. For example, here is an observation that comes up as a problem, but doesn't appear to be an NA:

id                date.time              outcome
observation543490 2012-03-11 02:14:01    C

What could possibly be going on here that is.na(df$date.time) returns a TRUE value for this row that has apparently been converted correctly?

Here's a reproducible example (if you're in CST):

is.na(strptime("03/11/12 2:14:01 AM", "%m/%d/%y %I:%M:%S %p", "CST6CDT"))
#[1] TRUE
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The problem is likely that all the times that return NA do not exist in whatever timezone you're using, due to daylight saving time.

Check with the data source to determine the timezone the data were recorded in, then set the tz argument to that value in your call to strptime.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

56.7k users

...