Re: Performance improvements for src/port/snprintf.c

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Subject: Re: Performance improvements for src/port/snprintf.c
Date: 2018-10-03 15:52:07
Message-ID: 20181003155207.b3lqmovuv2c5c4id@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-10-03 08:20:14 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> >> While there might be value in implementing our own float printing code,
> >> I have a pretty hard time getting excited about the cost/benefit ratio
> >> of that. I think that what we probably really ought to do here is hack
> >> float4out/float8out to bypass the extra overhead, as in the 0002 patch
> >> below.
>
> > I'm thinking we should do a bit more than just that hack. I'm thinking
> > of something (barely tested) like
>
> Meh. The trouble with that is that it relies on the platform's snprintf,
> not sprintf, and that brings us right back into a world of portability
> hurt. I don't feel that the move to C99 gets us out of worrying about
> noncompliant snprintfs --- we're only requiring a C99 *compiler*, not
> libc. See buildfarm member gharial for a counterexample.

Oh, we could just use sprintf() and tell strfromd the buffer is large
enough. I only used snprintf because it seemed more symmetric, and
because I was at most 1/3 awake.

> I'm happy to look into whether using strfromd when available buys us
> anything over using sprintf. I'm not entirely convinced that it will,
> because of the need to ASCII-ize and de-ASCII-ize the precision, but
> it's worth checking.

It's definitely faster. It's not a full-blown format parser, so I guess
the cost of the conversion isn't too bad:
https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strfrom-skeleton.c;hb=HEAD#l68

CREATE TABLE somefloats(id serial, data1 float8, data2 float8, data3 float8);
INSERT INTO somefloats(data1, data2, data3) SELECT random(), random(), random() FROM generate_series(1, 10000000);
VACUUM FREEZE somefloats;

I'm comparing the times of:
COPY somefloats TO '/dev/null';

master (including your commit):
16177.202 ms

snprintf using sprintf via pg_double_to_string:
16195.787

snprintf using strfromd via pg_double_to_string:
14856.974 ms

float8out using sprintf via pg_double_to_string:
16176.169

float8out using strfromd via pg_double_to_string:
13532.698

FWIW, it seems that using a local buffer and than pstrdup'ing that in
float8out_internal is a bit faster, and would probably save a bit of
memory on average:

float8out using sprintf via pg_double_to_string, pstrdup:
15370.774

float8out using strfromd via pg_double_to_string, pstrdup:
13498.331

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2018-10-03 15:57:20 Re: Early WIP/PoC for inlining CTEs
Previous Message Madeleine Thompson 2018-10-03 14:58:26 Re: BUG #15307: Low numerical precision of (Co-) Variance