Re: updated SORT/LIMIT patch

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: updated SORT/LIMIT patch
Date: 2007-05-16 13:39:24
Message-ID: 878xbo3n1v.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

"Alvaro Herrera" <alvherre(at)commandprompt(dot)com> writes:

> Gregory Stark wrote:
>
>> Attached is a small patch which fixes this case. It also makes the check
>> slightly more liberal -- we don't need to resort if the previous sort was
>> unbounded or the bound was greater than or equal to the new bound.
>
> Huh, can you clarify this comment:
>
> + * XXX It would be nice to check tuplesortstate->boundUsed too but that
> + * seems like an abstraction violation. And for that matter to check
> + * the tuplesort to see if randomaccess is possible even if it wasn't
> + * requested so we don't resort input when the parameters haven't
> + * changed if it was sorted in memory.
>
> I'm having serious trouble parsing it.

Well in the executor currently we check node->boundedDone and node->boundDone
to see whether the previous execution of this node had a bound. If it did and
we now don't or if it did but now our bound is larger then we have to
re-execute.

However the tuplesort may have actually done an unbounded sort either because
the bound (actually bound*2) may not have been reached or because it spilled
to disk and we did a disk sort. It would be nice to check that and not have to
reexecute unnecessarily.

But that means peeking into tuplesort's state. I started doing a patch
yesterday to do this by exporting a new method on tuplesort:

bool
tuplesort_random_access(Tuplesortstate *state, bool bounded, unsigned tuples_needed)
{
switch (state->status)
{
case TSS_FINALMERGE:
return false;
case TSS_SORTEDONTAPE:
return state->randomAccess;
case TSS_SORTEDINMEM:
return (!state->boundUsed || (bounded && state->bound > tuples_needed));
default:
return false;
}
}

That solves the abstraction barrier issue but I'm not sure if it's quite
detailed enough. Really Sort only needs to rewind the tuplesort which it can
do even if state->randomAccess is false. We may need two separate functions,
one which tests if rewind is supported and another which tests if random
access is supported. Also, I haven't looked at how the final merge case is
handled. It might be possible to support rescan (but not random access) in
that state as well.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-patches by date

  From Date Subject
Next Message Heikki Linnakangas 2007-05-16 13:59:54 Re: [DOCS] Autovacuum and XID wraparound
Previous Message Alvaro Herrera 2007-05-16 13:18:56 Re: updated SORT/LIMIT patch