Re: PoC: Partial sort

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, David Rowley <dgrowleyml(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: PoC: Partial sort
Date: 2017-01-31 05:53:05
Message-ID: CAB7nPqRbetvD_JdU-876_Bf=mZ7Z4-WGeWkaxCfvtGy+ARs-HQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 5, 2016 at 2:04 PM, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com> wrote:
>
>
> On Fri, Dec 2, 2016 at 4:05 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>> On Tue, Sep 13, 2016 at 4:32 AM, Alexander Korotkov
>> <aekorotkov(at)gmail(dot)com> wrote:
>> > On Fri, Apr 8, 2016 at 10:09 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> >> On Wed, Mar 30, 2016 at 8:02 AM, Alexander Korotkov
>> >> <aekorotkov(at)gmail(dot)com> wrote:
>> >> > Hmm... I'm not completely agree with that. In typical usage partial
>> >> > sort
>> >> > should definitely use quicksort. However, fallback to other sort
>> >> > methods is
>> >> > very useful. Decision of partial sort usage is made by planner. But
>> >> > planner makes mistakes. For example, our HashAggregate is purely
>> >> > in-memory.
>> >> > In the case of planner mistake it causes OOM. I met such situation
>> >> > in
>> >> > production and not once. This is why I'd like partial sort to have
>> >> > graceful
>> >> > degradation for such cases.
>> >>
>> >> I think that this should be moved to the next CF, unless a committer
>> >> wants to pick it up today.
>> >
>> > Patch was rebased to current master.
>>
>> Just a few quick observations on this...
>>
>> It strikes me that the API contract change in ExecMaterializesOutput
>> is pretty undesirable. I think it would be better to have a new
>> executor node for this node rather than overloading the existing
>> "Sort" node, sharing code where possible of course. The fact that
>> this would distinguish them more clearly in an EXPLAIN plan seems
>> good, too. "Partial Sort" is the obvious thing, but there might be
>> even better alternatives -- maybe "Incremental Sort" or something like
>> that? Because it's not partially sorting the data, it's making data
>> that already has some sort order have a more rigorous sort order.
>>
>> I think that it will probably be pretty common to have queries where
>> the data is sorted by (mostly_unique_col) and we want to get it sorted
>> by (mostly_unique_col, disambiguation_col). In such cases I wonder if
>> we'll incur a lot of overhead by feeding single tuples to the
>> tuplesort stuff and performing lots of 1-item sorts. Not sure if that
>> case needs any special optimization.
>>
>> I also think that the "HeapTuple prev" bit in SortState is probably
>> not the right way of doing things. I think that should use an
>> additional TupleTableSlot rather than a HeapTuple.
>>
>
> The feedback from the reviewer has received at the end of commitfest.
> So Moved to next CF with "waiting on author" status.

This patch is on its 6th commit fest now. As the thread has died and
as feedback has been provided but not answered I am marking it as
returned with feedback.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-01-31 05:53:16 Re: IF (NOT) EXISTS in psql-completion
Previous Message Michael Paquier 2017-01-31 05:51:08 Re: IF (NOT) EXISTS in psql-completion