Re: updated SORT/LIMIT patch

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: updated SORT/LIMIT patch
Date: 2007-05-15 14:07:08
Message-ID: 87wszaqiyb.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches


"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> [ greps a bit... ] It looks like the only way that you could expose the
> bug in the current state of the system would be if the sort/limit with
> the outer parameter were the inside of a nestloop join in the subplan.
> nodeNestloop would set EXEC_FLAG_REWIND, causing nodeSort to set
> randomAccess, allowing ExecReScanSort to suppose that it could rewind
> the sort.

I finally managed to trigger this case and found that the checks don't
actually work:

postgres=# SELECT (SELECT n
FROM (VALUES (1)) AS x,
(SELECT n FROM generate_series(1,10) AS n
ORDER BY n LIMIT 1 OFFSET s-1) AS y) AS z
FROM generate_series(1,10) AS s;
ERROR: retrieved too many tuples in a bounded sort

What's going on is that nodeLimit.c only invokes recompute_limit when the first
tuple is actually generated. It has a comment saying "(We can't do this any
earlier, because parameters from upper nodes may not be set until now.)"
So the checks are still comparing the previous bound against the boundDone.

Attached is a small patch which fixes this case. It also makes the check
slightly more liberal -- we don't need to resort if the previous sort was
unbounded or the bound was greater than or equal to the new bound.

There is one bit I'm not too sure of. We may or may not end up requesting
tuples from our child node. If we do we have to ReScan it but by then we don't
have the exprCtx passed to the ReScan call. I just made it call ReScan always
even if we later decide we can just rewind the tuplesort, is that ok?

Also, I left a comment that it would be nice if we could peek at the
tuplesort's boundUsed and state to avoid resorting unnecessarily. Currently it
pretty much always resorts unless you construct a bizarre query like the above
to force the randomAccess flag to be true. Most of the time tuplesort is going
to sort in memory anyways even if random access isn't requested and resorting
is pointless.

I think it would be worthwhile adding a method to tuplesort to ask whether
random access is possible and how many tuples were actually kept. Then
nodeSort could ask it those values instead of just remembering what values
were requested.

Attachment Content-Type Size
sortlimit-fix-v2.diff.gz application/octet-stream 1.8 KB

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Neil Conway 2007-05-15 15:54:09 Re: [PATCHES] Autovacuum and XID wraparound
Previous Message Alvaro Herrera 2007-05-15 13:07:18 Re: [PATCHES] Autovacuum and XID wraparound