Re: PoC: Partial sort

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: Jeremy Harris <jgh(at)wizmail(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PoC: Partial sort
Date: 2014-01-20 12:43:27
Message-ID: CAPpHfdsiRPaqn8DTty2DywkuOrXJJcJBQUiNy9Ossm1LDfjXwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 19, 2014 at 5:57 AM, Andreas Karlsson <andreas(at)proxel(dot)se> wrote:

> On 01/18/2014 08:13 PM, Jeremy Harris wrote:
>
>> On 31/12/13 01:41, Andreas Karlsson wrote:
>>
>>> On 12/29/2013 08:24 AM, David Rowley wrote:
>>>
>>>> If it was possible to devise some way to reuse any
>>>> previous tuplesortstate perhaps just inventing a reset method which
>>>> clears out tuples, then we could see performance exceed the standard
>>>> seqscan -> sort. The code the way it is seems to lookup the sort
>>>> functions from the syscache for each group then allocate some sort
>>>> space, so quite a bit of time is also spent in palloc0() and pfree()
>>>>
>>>> If it was not possible to do this then maybe adding a cost to the number
>>>> of sort groups would be better so that the optimization is skipped if
>>>> there are too many sort groups.
>>>>
>>>
>>> It should be possible. I have hacked a quick proof of concept for
>>> reusing the tuplesort state. Can you try it and see if the performance
>>> regression is fixed by this?
>>>
>>> One thing which have to be fixed with my patch is that we probably want
>>> to close the tuplesort once we have returned the last tuple from
>>> ExecSort().
>>>
>>> I have attached my patch and the incremental patch on Alexander's patch.
>>>
>>
>> How does this work in combination with randomAccess ?
>>
>
> As far as I can tell randomAccess was broken by the partial sort patch
> even before my change since it would not iterate over multiple tuplesorts
> anyway.
>
> Alexander: Is this true or am I missing something?

Yes, I decided that Sort node shouldn't provide randomAccess in the case of
skipCols !=0. See assert in the beginning of ExecInitSort. I decided that
it would be better to add explicit materialize node rather than store extra
tuples in tuplesortstate each time.
I also adjusted ExecSupportsMarkRestore, ExecMaterializesOutput and
ExecMaterializesOutput to make planner believe so. I found path->pathtype
to be absolutely never T_Sort. Correct me if I'm wrong.

Another changes in this version of patch:
1) Applied patch to don't compare skipCols in tuplesort by Marti Raudsepp
2) Adjusting sort bound after processing buckets.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
partial-sort-6.patch.gz application/x-gzip 17.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-01-20 12:55:06 Re: plpgsql.warn_shadow
Previous Message Marko Tiikkaja 2014-01-20 12:16:56 Re: plpgsql.warn_shadow