Re: Patch: Avoid precision error in to_timestamp().

From: Erik Nordström <erik(dot)nordstrom(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Patch: Avoid precision error in to_timestamp().
Date: 2017-02-09 15:30:57
Message-ID: CAHuQZDQyT4Qc+hCbmf+N-x_8UUQ30Bncn=wJ002aos5QKqK9pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom,

You are of course right that you cannot add precision that was not there to
begin with. My patch does nothing for input values that cannot be
represented accurately to begin with. However, that was not my main point.

The idea with the calculation is this: When you remove the integral/seconds
part of the value before converting to integral microseconds, you are
creating a number that can be represented with a float at higher accuracy
compared to the original value (i.e., simply because there are less digits
in the mantissa/significand). Thus when doing 0.312311172485352 *
USECS_PER_SEC, you suffer less precision-related errors with regards to
microseconds than when doing 14864803242.312311172485352 * USECS_PER_SEC as
in the original code. In other words, with this approach there are fewer
cases when 1 ULP is bigger than 1 microsecond.

FWIW, this is the output from latest postgres HEAD, which includes your
rint() patch:

postgres=# select to_timestamp(14864803242.312311);
to_timestamp
-------------------------------
2441-01-17 09:00:42.312312+01
(1 row)

And this is the output with my patch:

postgres=# select to_timestamp(14864803242.312311);
to_timestamp
-------------------------------
2441-01-17 09:00:42.312311+01
(1 row)

I don't know why you would get different output (that would be worrying in
itself).

In any case, I would prefer a to_timestamp(numeric), although I see no
reason not to make the float8 version as accurate as possible anyway.

-Erik

On Thu, Feb 9, 2017 at 3:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> =?UTF-8?Q?Erik_Nordstr=C3=B6m?= <erik(dot)nordstrom(at)gmail(dot)com> writes:
> > Thanks for the insightful feedback. You are right, the patch does suffer
> > from overflow (and other possible issues when I think about it). Using
> > rint(), as you suggest, helps in my original example case, but there are
> > still cases when the output is not what you would expect. For instance,
> > converting the Unix time 14864803242.312311 gives back the timestamp
> > "2441-01-17 09:00:42.312312+01", even if using rint() (see below).
>
> Hm, that particular case works for me using the patch I committed
> yesterday (which just added a rint() call to the existing code).
> I'm a bit skeptical of the version you propose here because it seems
> mighty prone to subtractive cancellation. You're basically computing
> x - int(x) which can't add any precision that wasn't there before.
> Looking at successive microsecond values in this range:
>
> regression=# select 14864803242.312310::float8 - 14864803242;
> ?column?
> -------------------
> 0.312309265136719
> (1 row)
>
> regression=# select 14864803242.312311::float8 - 14864803242;
> ?column?
> -------------------
> 0.312311172485352
> (1 row)
>
> regression=# select 14864803242.312312::float8 - 14864803242;
> ?column?
> -------------------
> 0.312311172485352
> (1 row)
>
> regression=# select 14864803242.312313::float8 - 14864803242;
> ?column?
> -------------------
> 0.312313079833984
> (1 row)
>
> Basically, 1 ULP in this range is more than 1 microsecond, so there are
> going to be places where you cannot get the right answer. Reformulating
> the computation will just shift the errors to nearby values. float8
> simply doesn't have enough bits to represent this many microseconds since
> 1970 exactly, and the problem's only going to get worse the further out
> you look.
>
> I think we might be better advised to add to_timestamp(numeric) alongside
> to_timestamp(float8). There's plenty of precedent for that (e.g, exp(),
> ln()) so I would not expect problems with ambiguous function calls.
> It'd be slower though ...
>
> regards, tom lane
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-02-09 15:58:21 Re: ICU integration
Previous Message Tom Lane 2017-02-09 14:56:29 Re: Patch: Avoid precision error in to_timestamp().