Re: PoC: Partial sort

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Marti Raudsepp <marti(at)juffo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: PoC: Partial sort
Date: 2014-09-12 21:19:55
Message-ID: CAPpHfdvvPozkiZPHoC6=AJO9bXS4mK6837GbBXibnWh=LkspRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jul 13, 2014 at 6:45 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:

> On Mon, Feb 10, 2014 at 10:59 AM, Alexander Korotkov
> <aekorotkov(at)gmail(dot)com> wrote:
> > Done. Patch is splitted.
>
> I took a quick look at this.
>
> Have you thought about making your new cmpSortSkipCols() function not
> use real comparisons? Since in the circumstances in which this
> optimization is expected to be effective (e.g. your original example)
> we can also expect a relatively low cardinality for the first n
> indexed attributes (again, as in your original example), in general
> when cmpSortSkipCols() is called there is a high chance that it will
> return true. If any pair of tuples (logically adjacent tuples fed in
> to cmpSortSkipCols() by an index scan in logical order) are not fully
> equal (i.e. their leading, indexed attributes are not equal) then we
> don't care about the details -- we just know that a new sort grouping
> is required.
>

Actually, higher cardinality skip columns is better. Sorting of smaller
groups is faster than sorting larger groups of same size. Also, with
smaller groups you achieve limit more accurate (in average), i.e. sort
smaller amount of total rows.

> The idea here is that you can get away with simple binary equality
> comparisons, as we do when considering HOT-safety. Of course, you
> might find that two bitwise unequal values are equal according to
> their ordinary B-Tree support function 1 comparator (e.g. two numerics
> that differ only in their display scale). AFAICT this should be okay,
> since that just means that you have smaller sort groupings than
> strictly necessary. I'm not sure if that's worth it to more or less
> duplicate heap_tuple_attr_equals() to save a "mere" n expensive
> comparisons, but it's something to think about (actually, there are
> probably less than even n comparisons in practice because there'll be
> a limit).
>

Not correct. Smaller groups are not OK. Imagine that two representations of
same skip column value exists. Index may return them in any order, even
change them one by one. In this case sorting on other column never takes
place, while it should. But some optimizations are still possible:

1. Use bitwise comparison first, then recheck. But, no guarantees that
acceleration will be achieved.
2. Use equality check instead of btree comparison. For "text" datatype
it would be rather faster because of no locale-aware comparison.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-09-12 21:22:58 Re: bad estimation together with large work_mem generates terrible slow hash joins
Previous Message Tomas Vondra 2014-09-12 21:09:03 Re: bad estimation together with large work_mem generates terrible slow hash joins