Re: PoC: Partial sort

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: PoC: Partial sort
Date: 2014-09-15 11:58:41
Message-ID: CAPpHfdvNiWvMQgUvk0GVmRCQMRVUY6GPasPCvD-rwsiYJEWaWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 14, 2014 at 9:32 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:

> I think we might be better off if a tuplesort function was called
> shortly after tuplesort_begin_heap() is called. How top-n heap sorts
> work is something that largely lives in tuplesort's head. Today, we
> call tuplesort_set_bound() to hint to tuplesort "By the way, this is a
> top-n heap sort applicable sort". I think that with this patch, we
> should then hint (where applicable) "by the way, you won't actually be
> required to sort those first n indexed attributes; rather, you can
> expect to scan those in logical order. You may work the rest out
> yourself, and may be clever about exploiting the sorted-ness of the
> first few columns". The idea of managing a bunch of tiny sorts from
> with ExecSort(), and calling the new function tuplesort_reset() seems
> questionable. tuplesortstate is supposed to be private/opaque to
> nodeSort.c, and the current design strains that.
>
> I'd like to keep nodeSort.c simple. I think it's pretty clear that the
> guts of this do not belong within ExecSort(), in any case. Also, the
> additions there should be much better commented, wherever they finally
> end up.
>

As I understand, you propose to incapsulate partial sort algorithm into
tuplesort. However, in this case we anyway need some significant change of
its interface: let tuplesort decide when it's able to return tuple.
Otherwise, we would miss significant part of LIMIT clause optimization.
tuplesort_set_bound() can't solve all the cases. There could be other
planner nodes between the partial sort and LIMIT.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kouhei Kaigai 2014-09-15 12:38:06 Re: [v9.5] Custom Plan API
Previous Message Alexander Korotkov 2014-09-15 11:49:43 Re: PoC: Partial sort