Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
223 views
in Technique[技术] by (71.8m points)

How R formats POSIXct with fractional seconds

I believe that R incorrectly formats POSIXct types with fractional seconds. I submitted this via R-bugs as an enhancement request and got brushed off with "we think the current behavior is correct -- bug deleted." While I am very appreciative of the work they have done and continue to do, I wanted to get other peoples' take on this particular issue, and perhaps advice on how to make the point more effectively.

Here is an example:

 > tt <- as.POSIXct('2011-10-11 07:49:36.3')
 > strftime(tt,'%Y-%m-%d %H:%M:%OS1')
 [1] "2011-10-11 07:49:36.2"

That is, tt is created as a POSIXct time with fractional part .3 seconds. When it is printed with one decimal digit, the value shown is .2. I work a lot with timestamps of millisecond precision and it causes me a lot of headaches that times are often printed one notch lower than the actual value.

Here is what is happening: POSIXct is a floating-point number of seconds since the epoch. All integer values are handled precisely, but in base-2 floating point, the closest value to .3 is very slightly smaller than .3. The stated behavior of strftime() for format %OSn is to round down to the requested number of decimal digits, so the displayed result is .2. For other fractional parts the floating point value is slightly above the value entered and the display gives the expected result:

 > tt <- as.POSIXct('2011-10-11 07:49:36.4')
 > strftime(tt,'%Y-%m-%d %H:%M:%OS1')
 [1] "2011-10-11 07:49:36.4"

The developers' argument is that for time types we should always round down to the requested precision. For example, if the time is 11:59:59.8 then printing it with format %H:%M should give "11:59" not "12:00", and %H:%M:%S should give "11:59:59" not "12:00:00". I agree with this for integer numbers of seconds and for format flag %S, but I think the behavior should be different for format flags that are designed for fractional parts of seconds. I would like to see %OSn use round-to-nearest behavior even for n = 0 while %S uses round-down, so that printing 11:59:59.8 with format %H:%M:%OS0 would give "12:00:00". This would not affect anything for integer numbers of seconds because those are always represented precisely, but it would more naturally handle round-off errors for fractional seconds.

This is how printing of fractional parts is handled in, for example C, because integer casting rounds down:

 double x = 9.97;
 printf("%d
",(int) x);   //  9
 printf("%.0f
",x);       //  10
 printf("%.1f
",x);       //  10.0
 printf("%.2f
",x);       //  9.97

I did a quick survey of how fractional seconds are handled in other languages and environments, and there really doens't seem to be a consensus. Most constructs are designed for integer numbers of seconds and the fractional parts are an afterthought. It seems to me that in this case the R developers made a choice that is not completely unreasonable but is in fact not the best one, and is not consistent with the conventions elsewhere for displaying floating-point numbers.

What are peoples' thoughts? Is the R behavior correct? Is it the way you yourself would design it?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

One underlying problem is that the POSIXct representation is less precise than the POSIXlt representation, and the POSIXct representation gets converted to the POSIXlt representation before formatting. Below we see that if our string is converted directly to POSIXlt representation, it outputs correctly.

> as.POSIXct('2011-10-11 07:49:36.3')
[1] "2011-10-11 07:49:36.2 CDT"
> as.POSIXlt('2011-10-11 07:49:36.3')
[1] "2011-10-11 07:49:36.3"

We can also see that by looking at the difference between the binary representation of the two formats and the usual representation of 0.3.

> t1 <- as.POSIXct('2011-10-11 07:49:36.3')
> as.numeric(t1 - round(unclass(t1))) - 0.3
[1] -4.768372e-08

> t2 <- as.POSIXlt('2011-10-11 07:49:36.3')
> as.numeric(t2$sec - round(unclass(t2$sec))) - 0.3
[1] -2.831069e-15

Interestingly, it looks like both representations are actually less than the usual representation of 0.3, but that the second one is either close enough, or truncates in a way different than I'm imagining here. Given that, I'm not going to worry about floating point representation difficulties; they may still happen, but if we're careful about which representation we use, they will hopefully be minimized.

Robert's desire for rounded output is then simply an output problem, and could be addressed in any number of ways. My suggestion would be something like this:

myformat.POSIXct <- function(x, digits=0) {
  x2 <- round(unclass(x), digits)
  attributes(x2) <- attributes(x)
  x <- as.POSIXlt(x2)
  x$sec <- round(x$sec, digits)
  format.POSIXlt(x, paste("%Y-%m-%d %H:%M:%OS",digits,sep=""))
}

This starts with a POSIXct input, and first rounds to the desired digits; it then converts to POSIXlt and rounds again. The first rounding makes sure that all of the units increase appropriately when we are on a minute/hour/day boundary; the second rounding rounds after converting to the more precise representation.

> options(digits.secs=1)
> t1 <- as.POSIXct('2011-10-11 07:49:36.3')
> format(t1)
[1] "2011-10-11 07:49:36.2"
> myformat.POSIXct(t1,1)
[1] "2011-10-11 07:49:36.3"

> t2 <- as.POSIXct('2011-10-11 23:59:59.999')
> format(t2)
[1] "2011-10-11 23:59:59.9"
> myformat.POSIXct(t2,0)
[1] "2011-10-12 00:00:00"
> myformat.POSIXct(t2,1)
[1] "2011-10-12 00:00:00.0"

A final aside: Did you know the standard allows for up to two leap seconds?

> as.POSIXlt('2011-10-11 23:59:60.9')
[1] "2011-10-11 23:59:60.9"

OK, one more thing. The behavior actually changed in May due to a bug filed by the OP (Bug 14579); before that it did round fractional seconds. Unfortunately that meant that sometimes it could round up to a second that wasn't possible; in the bug report, it went up to 60 when it should have rolled over to the next minute. One reason the decision was made to truncate instead of round is that it's printing from the POSIXlt representation, where each unit is stored separately. Thus rolling over to the next minute/hour/etc is more difficult than just a straightforward rounding operation. To round easily, it's necessary to round in POSIXct representation and then convert back, as I suggest.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...