Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
Date: 2015-12-21 19:55:50
Message-ID: CAM3SWZRMwm5Zc6ze1sou2Divkbq2LzF-kU_wRWs+NtuG+gJ9ww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 21, 2015 at 10:53 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> PFA my proposal for comment changes for 9.5 and master. This is based
> on your 0001, but I edited somewhat. Please let me know your
> thoughts. I am not willing to go further and rearrange actual code in
> 9.5 at this point; it just isn't necessary.

Fine by me. But this revision hasn't made the important point at all
-- which is that 0002 is safe. That's a stronger guarantee than the
abbreviated key representation being pass-by-value.

> Looking at this whole system again, I wonder if we're missing a trick
> here. How about if we decree from on high that the abbreviated-key
> comparator is always just the application of an integer comparison
> operator? The only abbreviation function that we have right now that
> wouldn't be happy with that is the one for numeric, and that looks
> like it could be fixed.

I gather you mean that we'd decree that they were always compared
using a text or uuid style 3-way unsigned integer comparison, allowing
us to hardcode that.

> Then we could hard-code knowledge of this
> representation in tuplesort.c in such a way that it wouldn't need to
> call a comparator function at all - instead of doing
> ApplySortComparator() and then maybe ApplySortAbbrevFullComparator(),
> it would do something like:
>
> if (using_abbreviation && (compare = ApplyAbbrevComparator(...)) != 0)
> return compare;
>
> I'm not sure if that would save enough cycles vs. the status quo to be
> worth bothering with, but it seems like it might. You may have a
> better feeling for that than I do.

I like this idea. I thought of it myself already, actually.

It has some nice properties, because unsigned integers are so simple
and flexible. For example, we can do it with int4 sometimes, and then
compare two attributes at once on 64-bit platforms. Maybe the second
attribute (the second set of 4 bytes) isn't an int4, though -- it's
any other type's abbreviated key (which can be assumed to have a
compatible representation). That's one more advanced possibility.

It seems worthwhile because numeric is the odd one out. Once I or
someone else gets around to adding network types, which could happen
in 9.6, then we are done (because there is already a pending patch
that adds support for remaining character-like types and alternative
opclasses). I really don't foresee much additional use of abbreviated
keys beyond these cases. There'd be a small but nice boost by not
doing pointer chasing for the abbreviated key comparison, I imagine,
but that benefit would be felt everywhere. Numeric is the odd one out
currently, but as you say it can be fixed, and I foresee no other
trouble from any other opclass/type that cares about abbreviation.

Another issue is that abbreviated keys in indexes are probably not
going to take 8 bytes, because they'll go in the ItemId array. It's
likely to be very useful to be able to take the first two bytes, and
know that a uint16 comparison is all that is needed, and have a basis
to perform an interpolation search rather than a binary search
(although that needs to be adaptive to different distributions, I
think it could be an effective technique -- there are many cache
misses for binary searches on internal B-Tree pages).

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2015-12-21 20:09:48 Re: tracking owner of extension-managed objects
Previous Message Tomas Vondra 2015-12-21 19:51:09 Re: GIN data corruption bug(s) in 9.6devel