Re: WIP: Make timestamptz_out less slow.

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Make timestamptz_out less slow.
Date: 2015-07-27 22:59:15
Message-ID: CAKJS1f8sktwXF2L_C2RD64gNbry3Gq2_zp-kr9mpgw=WV+tY_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28 July 2015 at 09:17, Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> I recently once more noticed that timestamptz_out is really, really
> slow. To quantify that, I created a bunch of similar sized tables:
>
> CREATE TABLE tbl_timestamp AS SELECT NOW() FROM generate_series(1, 100000)
> a, generate_series(1, 100) b;
> CREATE TABLE tbl_int8 AS SELECT 1::bigint FROM generate_series(1, 100000)
> a, generate_series(1, 100) b;
> CREATE TABLE tbl_bytea AS SELECT ' '::bytea FROM generate_series(1,
> 100000) a, generate_series(1, 100) b;
>
> These all end up being 346MB large.
>
> COPY tbl_bytea TO '/dev/null';
> Time: 1173.484 ms
> COPY tbl_int8 TO '/dev/null';
> Time: 1030.756 ms
> COPY tbl_timestamp TO '/dev/null';
> Time: 6598.030
>
> (all best of three)
>
> Yes, timestamp outputs more data as a whole, but being 5 times as slow
> is still pretty darn insane. To make sure that's not the cause, here's
> another table:
>
> CREATE TABLE tbl_timestamptext AS SELECT NOW()::text FROM
> generate_series(1, 100000) a, generate_series(1, 100) b;
> COPY tbl_timestamptext TO '/dev/null';
> Time: 1449.554 ms
>
> So it's really just the timestamp code.
>
>
> Profiling it shows:
> Overhead Command Shared Object Symbol
> - 38.33% postgres_stock libc-2.19.so [.] vfprintf
> - 97.92% vfprintf
> _IO_vsprintf
> - sprintf
> + 70.25% EncodeDateTime
> + 29.75% AppendSeconds.constprop.10
> + 1.11% _IO_default_xsputn
> - 8.22% postgres_stock libc-2.19.so [.] _IO_default_xsputn
> - 99.43% _IO_default_xsputn
> - 90.09% vfprintf
> _IO_vsprintf
> - sprintf
> + 74.15% EncodeDateTime
> + 25.85% AppendSeconds.constprop.10
> + 9.72% _IO_padn
> + 0.57% vfprintf
> + 7.76% postgres_stock postgres_stock [.] CopyOneRowTo
>
> So nearly all the time is spent somewhere inside the sprintf calls. Not
> nice.
>
> The only thing I could come up to make the sprintfs cheaper was to
> combine them into one and remove some of the width specifiers that
> aren't needed. That doesn't buy us very much.
>
> I then proceeded to replace the sprintf call with hand-rolled
> conversions. And wow, the benefit is far bigger than I'd assumed:
> postgres[7236][1]=# COPY tbl_timestamp TO '/dev/null';
> Time: 2430.521 ms
>
> So, by hand-rolling the ISO conversion in EncodeDateTime() we got a
> ~250% performance improvement. I'd say that's worthwhile.
>
> The attached patch shows what I did. While there's some polishing
> possible, as a whole, it's pretty ugly. But I think timestamp data is so
> common that it's worth the effort.
>
> Does anybody have a fundamentally nicer idea than the attached to
> improvide this?
>

It won't be quite as fast as what you've written, but I think it will be
much neater and more likely to be used in other places if we invent a
function like pg_ltoa() which returns a pointer to the new end of string.

Also if we're specifying padding with zeros then we can skip the reverse
part that's in pg_ltoa(), (normally needed since the numeric string is
build in reverse)

The code could then be written as:

str = pg_int2str_pad(str, year, 4);
*str++ = '-';
str = pg_int2str_pad(str, tm->tm_mon, 2);
*str++ = '-';
str = pg_int2str_pad(str, tm->tm_mday, 2);

etc

I've used this method before and found it to be about 10 times faster than
snprintf(), but I was reversing the string, so quite likely it be more than
10 times.

I'm interested to see how much you're really gaining by manually unrolling
the part that builds the fractional part of the second.

We could just build that part with: (untested)

if (fsec != 0)
{
int fseca = abs(fsec);
while (fseca % 10 == 0 && fseca > 0)
fseca /= 10;
*str++ = '.';
str = pg_int2str(str, fseca);
}

Regards

David Rowley

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2015-07-27 23:22:45 Re: Optimization idea: merging multiple EXISTS(...) with constraint-based join removal
Previous Message Josh Berkus 2015-07-27 22:56:03 Re: Autonomous Transaction is back