Re: Performance improvements for src/port/snprintf.c

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Subject: Re: Performance improvements for src/port/snprintf.c
Date: 2018-10-07 11:59:18
Message-ID: 878t3a9w7r.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

Tom> Now, "shortest value that converts back exactly" is technically
Tom> cool, but I am not sure that it solves any real-world problem that
Tom> we have.

Well, it seems to me that it is perfect for pg_dump.

Also it's kind of a problem that our default float output is not
round-trip safe - people do keep wondering why they can select a row and
it'll show a certain value, but then doing WHERE col = 'xxx' on that
value does not find the row. Yes, testing equality of floats is bad, but
there's no reason to put in extra landmines.

Tom> I'm also worried that introducing it would result in complaints like
Tom> https://www.postgresql.org/message-id/CANaXbVjw3Y8VmapWuZahtcRhpE61hsSUcjquip3HuXeuN8y4sg%40mail.gmail.com

Frankly for a >20x performance improvement in float8out I don't think
that's an especially big deal.

Tom> As for #2, my *very* short once-over of the code led me to think
Tom> that the speed win comes mostly from use of wide integer
Tom> arithmetic,

Data point: forcing it to use 64-bit only (#define RYU_ONLY_64_BIT_OPS)
makes negligible difference on my test setup.

Tom> and maybe from throwing big lookup tables at the problem. If so,
Tom> it's very likely possible that we could adopt those techniques
Tom> without necessarily buying into the shortest-exact rule for how
Tom> many digits to print.

If you read the ACM paper (linked from the upstream github repo), it
explains how the algorithm works by combining the radix conversion step
with (the initial iterations of) the operation of finding the shortest
representation. This allows limiting the number of bits needed for the
intermediate results so that it can all be done in fixed-size integers,
rather than using an arbitrary-precision approach.

I do not see any obvious way to use this code to generate the same
output in the final digits that we currently do (in the sense of
overly-exact values like outputting 1.89999999999999991 for 1.9 when
extra_float_digits=3).

>> One option would be to stick with snprintf if extra_float_digits is
>> less than 0 (or less than or equal to 0 and make the default 1) and
>> use ryu otherwise, so that the option to get rounded floats is still
>> there. (Apparently some people do use negative values of
>> extra_float_digits.) Unlike other format-changing GUCs, this one
>> already exists and is already used by people who want more or less
>> precision, including by pg_dump where rount-trip conversion is the
>> requirement.

Tom> I wouldn't necessarily object to having some value of
Tom> extra_float_digits that selects the shortest-exact rule, but I'm
Tom> thinking maybe it should be a value we don't currently accept.

Why would anyone currently set extra_float_digits > 0 if not to get
round-trip-safe values?

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-10-07 12:17:31 DSM segment handle generation in background workers
Previous Message Michael Paquier 2018-10-07 09:37:44 Re: Unclear error message