Re: PoC: Partial sort

From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: PoC: Partial sort
Date: 2015-06-07 15:10:06
Message-ID: 55745ECE.6080009@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/15/2014 01:58 PM, Alexander Korotkov wrote:
> On Sun, Sep 14, 2014 at 9:32 AM, Peter Geoghegan <pg(at)heroku(dot)com
> <mailto:pg(at)heroku(dot)com>> wrote:
>
> I think we might be better off if a tuplesort function was called
> shortly after tuplesort_begin_heap() is called. How top-n heap sorts
> work is something that largely lives in tuplesort's head. Today, we
> call tuplesort_set_bound() to hint to tuplesort "By the way, this is a
> top-n heap sort applicable sort". I think that with this patch, we
> should then hint (where applicable) "by the way, you won't actually be
> required to sort those first n indexed attributes; rather, you can
> expect to scan those in logical order. You may work the rest out
> yourself, and may be clever about exploiting the sorted-ness of the
> first few columns". The idea of managing a bunch of tiny sorts from
> with ExecSort(), and calling the new function tuplesort_reset() seems
> questionable. tuplesortstate is supposed to be private/opaque to
> nodeSort.c, and the current design strains that.
>
> I'd like to keep nodeSort.c simple. I think it's pretty clear that the
> guts of this do not belong within ExecSort(), in any case. Also, the
> additions there should be much better commented, wherever they finally
> end up.
>
>
> As I understand, you propose to incapsulate partial sort algorithm into
> tuplesort. However, in this case we anyway need some significant change
> of its interface: let tuplesort decide when it's able to return tuple.
> Otherwise, we would miss significant part of LIMIT clause optimization.
> tuplesort_set_bound() can't solve all the cases. There could be other
> planner nodes between the partial sort and LIMIT.

Hi,

Are you planning to work on this patch for 9.6?

I generally agree with Peter that the changes to the sorting probably
belong in the tuplesort code rather than in the executor. This way it
should also be theoretically possible to support mark/restore.

Andreas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2015-06-07 20:01:23 Re: [CORE] Restore-reliability mode
Previous Message Fabien COELHO 2015-06-07 14:53:12 Re: checkpointer continuous flushing