Re: Ryu floating point output patch

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, "pgsql-hackers\(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Ryu floating point output patch
Date: 2019-01-15 10:37:58
Message-ID: 871s5emitx.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Andres" == Andres Freund <andres(at)anarazel(dot)de> writes:

>> In particular, how does it know how every strtod() on the planet
>> will react to specific input?

Andres> strtod()'s job ought to computationally be significantly easier
Andres> than the other way round, no? And if there's buggy strtod()
Andres> implementations out there, why would they be guaranteed to do
Andres> the correct thing with our current output, but not with ryu's?

Funny thing: I've been devoting considerable effort to testing this, and
the one failure mode I've found is very interesting; it's not a problem
with strtod(), in fact it's a bug in our float4in caused by _misuse_ of
strtod().

In particular, float4in thinks it's ok to do strtod() on the input, and
then round the result to a float. It is wrong to think that, and here's
why:

Consider the (float4) bit pattern x15ae43fd. The true decimal value of
this, and its adjacent values are:

x15ae43fc = 7.0385300755536269169437150392273653469292493678466371420654468238353729248046875e-26
midpoint 7.0385303837024180189014515281838361605176203339429008565275580622255802154541015625e-28
x15ae43fd = 7.038530691851209120859188017140306974105991300039164570989669300615787506103515625e-26
midpoint 7.0385310000000002228169245060967777876943622661354282854517805390059947967529296875e-26
x15ae43fe = 7.03853130814879132477466099505324860128273323223169199991389177739620208740234375e-26

Now look at what happens if you pass '7.038531e-26' as input for float4.

From the above values, the correct result is x15ae43fd, because any
other value would be strictly further away. But if you input the value
as a double first, here are the adjacent representations:

x3ab5c87fafffffff = 7.038530999999999074873222531206633286975087635036133540...e-26
midpoint 7.0385309999999996488450735186517055373347249505857809130...e-28
x3ab5c87fb0000000 = 7.03853100000000022281692450609677778769436226613542828545...e-28
midpoint 7.03853100000000079678877549354185003805399958168507565784...e-28
x3ab5c87fb0000001 = 7.03853100000000137076062648098692228841363689723472303024...e-28

So the double value is x3ab5c87fb0000000, which has a mantissa of
(1).01011100100001111111101100000...

which when rounded to 23 bits (excluding the implied (1)), is
010 1110 0100 0011 1111 1110 = 2e43fe

So the rounded result is incorrectly x15ae43fe, rather than the expected
correct value of x15ae43fd.

strtof() exists as part of the relevant standards, so float4in should be
using that in place of strtod, and that seems to fix this case for me.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-01-15 10:42:08 Re: Log a sample of transactions
Previous Message Masahiko Sawada 2019-01-15 10:16:54 Re: Synchronous replay take III