Re: WIP: Make timestamptz_out less slow.

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Make timestamptz_out less slow.
Date: 2015-11-05 02:30:47
Message-ID: CAKJS1f8GvjqFx7SaLTvSLyzfw_SGmcC3MCGUb0=N9n=Xq9Ax6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5 November 2015 at 13:20, Peter Geoghegan <pg(at)heroku(dot)com> wrote:

> On Fri, Sep 11, 2015 at 9:00 AM, David Rowley
> <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> > Attached is also timestamp_delta.patch which changes pg_ltostr() to use
> the
> > INT_MIN special case that pg_ltoa also uses. I didn't make a similar
> change
> > to pg_ltostr_zeropad() as I'd need to add even more special code to
> prepend
> > the correct number of zero before the 2147483648. This zero padding
> reason
> > was why I originally came up with the alternative way to handle the
> negative
> > numbers. I had just thought that it was best to keep both functions as
> > similar as possible.
>
> I've started looking at this -- "timestamp_out_speedup_2015-09-12.patch".
>
> > I've not done anything yet to remove the #ifdef HAVE_INT64_TIMESTAMP from
> > AppendSeconds(). The only way I can think to handle this is to just make
> > fsec_t unconditionally an int64 (since with float datetimes the
> precision is
> > 10 decimal digits after the dec point, this does not fit in int32). I'd
> go
> > off and do this, but this means I need to write an int64 version of
> > pg_ltostr(). Should I do it this way?
>
> Have you thought about *just* having an int64 pg_ltostr()? Are you
> aware of an appreciable overhead from having int32 callers just rely
> on a promotion to int64?
>
>
I'd not thought of that. It would certainly add unnecessary overhead on
32bit machines.
To be honest, I don't really like the idea of making fsec_t an int64, there
are just to many places around the code base that need to be updated. Plus
with float datetimes, we'd need to multiple the fractional seconds by
1000000000, which is based on the MAX_TIME_PRECISION setting as defined
when HAVE_INT64_TIMESTAMP is undefined.

> I have a few minor nitpicks about the patch.
>
> No need for "negative number" comment here -- just use
> "Assert(precision >= 0)" code:
>
> + * Note 'precision' must not be a negative number.
> + * Note callers should assume cp will not be NUL terminated.
> */
> -static void
> +static char *
> AppendSeconds(char *cp, int sec, fsec_t fsec, int precision, bool
> fillzeros)
> {
> +#ifdef HAVE_INT64_TIMESTAMP
> +
>
> I would just use a period/full stop here:
>
> + * Note str is not NUL terminated, callers are responsible for NUL
> terminating
> + * str themselves.
>
>
Do you mean change the comment to "Note str is not NUL terminated. Callers
are responsible for NUL terminating str themselves." ?

> Don't think you need so many "Note:" specifiers here:
>
> + * Note: Callers should ensure that 'padding' is above zero.
> + * Note: This function is optimized for the case where the number is not
> too
> + * big to fit inside of the specified padding.
> + * Note: Caller must ensure that 'str' points to enough memory to hold the
> + result (at least 12 bytes, counting a leading sign and trailing
> NUL,
> + or padding + 1 bytes, whichever is larger).
> + */
> +char *
> +pg_ltostr_zeropad(char *str, int32 value, int32 padding)
>
>
Yes, that's a bit ugly. How about just Assert(padding > 0) just like above?
That gets rid of one Note:
The optimized part is probably not that important anyway. I just mentioned
it to try and reduce the changes of someone using it when the padding is
always too small.

> Not so sure about this, within the new pg_ltostr_zeropad() function:
>
> + /*
> + * Handle negative numbers in a special way. We can't just append a '-'
> + * prefix and reverse the sign as on two's complement machines negative
> + * numbers can be 1 further from 0 than positive numbers, we do it
> this way
> + * so we properly handle the smallest possible value.
> + */
> + if (num < 0)
> + {
> ...
> + }
> + else
> + {
> ...
> + }
>
> Maybe it's better to just special case INT_MIN and the do an Abs()?
> Actually, would any likely caller of pg_ltostr_zeropad() actually care
> if INT_MIN was a case that you cannot handle, and assert against? You
> could perhaps reasonably make it the caller's problem. Haven't made up
> my mind if your timestamp_delta.patch is better than that; cannot
> immediately see a problem with putting it on the caller.
>
>
I can make that case work the same as pg_ltoa() for pg_ltostr(), that's not
a problem, I did that in the delta patch. With pg_ltostr_zeropad() I'd need
to add some corner case code to prepend the correct number of zeros onto
-2147483648. This is likely not going to look very nice, and also it's
probably going to end up about the same number of lines of code to what's
in the patch already. If you think it's better for performance, then I've
already done tests to show that it's not better. I really don't think it'll
look very nice either. I quite like my negative number handling, since it's
actually code that would be exercised 50% of the time ran than 1/ 2^32 of
the time, assuming all numbers have an equal chance of being passed.

> More generally, maybe routines like EncodeDateTime() ought now to use
> the Abs() macro.
>
>
You might be right about that. Then we can just have pg_ltostr() do
unsigned only. The reason I didn't do that is because I was trying to
minimise what I was changing in the patch, and I thought it was less likely
to make it if I started adding calls to Abs() around the code base.

Perhaps it's good to have pg_ltostr() there so that we have a good fast way
to build generic strings in Postgres incrementally. I'd say we'd have a
much less reusable function if we only supported unsigned int, as we don't
have a database type which is based on unsigned int.

> Those are all of my nitpicks. :-)
>
>
Many thanks for looking at this. Nitpicks are fine :)

> > I also did some tests on server grade hardware, and the performance
> increase
> > is looking a bit better.
> >
> > create table ts (ts timestamp not null);
> > insert into ts select generate_series('2010-01-01 00:00:00', '2011-01-01
> > 00:00:00', '1 sec'::interval);
> > vacuum freeze ts;
>
> On my laptop, I saw a 2.4X - 2.6X speedup for this case. That seems
> pretty nice to me.
>
> Will make another pass over this soon.
>
> Thanks

--
David Rowley http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
timestamp_out_speedup_2015-11-05.patch application/octet-stream 22.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2015-11-05 03:17:37 Re: OS X El Capitan and DYLD_LIBRARY_PATH
Previous Message Tomas Vondra 2015-11-05 02:15:31 Re: Bitmap index scans use of filters on available columns