Re: Patch: Avoid precision error in to_timestamp().

From: Erik Nordström <erik(dot)nordstrom(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Patch: Avoid precision error in to_timestamp().
Date: 2017-02-09 14:12:17
Message-ID: CAHuQZDQspkizCX9eiMaxV6hV66c_cvXE_hCVMVU2fCb2ZiWBuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tom, and others,

Thanks for the insightful feedback. You are right, the patch does suffer
from overflow (and other possible issues when I think about it). Using
rint(), as you suggest, helps in my original example case, but there are
still cases when the output is not what you would expect. For instance,
converting the Unix time 14864803242.312311 gives back the timestamp
"2441-01-17 09:00:42.312312+01", even if using rint() (see below).

I guess precision-related errors are unavoidable when working with floating
point numbers (especially large ones). But I am wondering if it is not
desirable to avoid (or reduce) errors at least in the case when the input
value can be accurately represented to the microsecond by a float (i.e.,
when rounded to microsecond, as in printf)? For example, splitting up the
float into second and microsecond integers might help reduce errors,
especially with large numbers. Something like this:

ts_seconds = (int64)seconds;
ts_microseconds = (int64)rint((seconds - ts_seconds) * USECS_PER_SEC);
result = (ts_seconds - epoch_diff_seconds) * USECS_PER_SEC +
ts_microseconds;

Note that stripping of the full seconds before scaling the microsecond
fractional part will keep it more accurate, and it should solve the problem
for the case above, including overflow.

Or, maybe I am overthinking this and the only proper solution is to provide
a to_timestamp() that takes a microsecond integer? This certainly wouldn't
hurt in any case an makes sense since the timestamp is itself an integer
internally.

Anyway, I am attaching an updated version of the path. Below are some
example outputs (including min and max allowed input) from original
Postgres, with your rint() fix, and the included patch:

Original postgres:

test=# select to_timestamp(9224318015999);
to_timestamp
---------------------------------
294277-01-01 00:59:58.999552+01
(1 row)

test=# select to_timestamp(-210866803200);
to_timestamp
---------------------------------
4714-11-24 01:12:12+01:12:12 BC
(1 row)

test=# select to_timestamp(1486480324.236538);
to_timestamp
-------------------------------
2017-02-07 16:12:04.236537+01
(1 row)

test=# select to_timestamp(14864803242.312311);
to_timestamp
-------------------------------
2441-01-17 09:00:42.312312+01
(1 row)

With rint():

test=# select to_timestamp(9224318015999);
to_timestamp
---------------------------------
294277-01-01 00:59:58.999552+01
(1 row)

test=# select to_timestamp(-210866803200);
to_timestamp
---------------------------------
4714-11-24 01:12:12+01:12:12 BC
(1 row)

test=# select to_timestamp(1486480324.236538);
to_timestamp
-------------------------------
2017-02-07 16:12:04.236538+01
(1 row)

test=# select to_timestamp(14864803242.312311);
to_timestamp
-------------------------------
2441-01-17 09:00:42.312312+01
(1 row)

Included patch:

test=# select to_timestamp(9224318015999);
to_timestamp
--------------------------
294277-01-01 00:59:59+01
(1 row)

test=# select to_timestamp(-210866803200);
to_timestamp
---------------------------------
4714-11-24 01:12:12+01:12:12 BC
(1 row)

test=# select to_timestamp(1486480324.236538);
to_timestamp
-------------------------------
2017-02-07 16:12:04.236538+01
(1 row)

test=# select to_timestamp(14864803242.312311);
to_timestamp
-------------------------------
2441-01-17 09:00:42.312311+01
(1 row)

--Erik

On Wed, Feb 8, 2017 at 9:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I wrote:
> > I wonder if we could make things better just by using rint() rather than
> > a naive cast-to-integer. The cast will truncate not round, and I think
> > that might be what's mostly biting you. Does this help for you?
>
> > #ifdef HAVE_INT64_TIMESTAMP
> > - result = seconds * USECS_PER_SEC;
> > + result = rint(seconds * USECS_PER_SEC);
> > #else
>
> I poked around looking for possible similar issues elsewhere, and found
> that a substantial majority of the datetime-related code already uses
> rint() when trying to go from float to int representations. As far as
> I can find, this function and make_interval() are the only ones that
> fail to do so. So I'm now thinking that this is a clear oversight,
> and both those places need to be patched to use rint().
>
> regards, tom lane
>

Attachment Content-Type Size
to_timestamp_fix_2.patch application/octet-stream 1.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2017-02-09 14:48:25 Re: One-shot expanded output in psql using \gx
Previous Message Ashutosh Sharma 2017-02-09 13:56:41 Re: pageinspect: Hash index support