Re: BUG #7627: Bad query plan when using LIMIT

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: edward(at)clericare(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7627: Bad query plan when using LIMIT
Date: 2012-10-30 09:23:17
Message-ID: CA+U5nMK_TatpM9r7Z9OzZ7Kv-6cmE8Pdw_wrRnyxUypDkAFADQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 30 October 2012 05:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> edward(at)clericare(dot)com writes:
>> The following two queries differ only by adding "LIMIT 1", but the one with
>> the limit gets radically worse performance. I've done VACUUM FULL, VACUUM
>> ANALYZE, and REINDEX DATABASE and there are no modifications since.
>
>> EXPLAIN ANALYZE SELECT * FROM commits WHERE id IN (SELECT id FROM commits
>> ORDER BY tree_high LIMIT 605 ) AND tree_other IS NULL ORDER BY tree_high
>> DESC;
>
>> EXPLAIN ANALYZE SELECT * FROM commits WHERE id IN (SELECT id FROM commits
>> ORDER BY tree_high LIMIT 605 ) AND tree_other IS NULL ORDER BY tree_high
>> DESC LIMIT 1;
>
> I think what's happening there is that the planner supposes that an
> indexscan in "tree_high DESC" order will find rows matching the IN
> condition uniformly distributed in the scan order --- but, because of
> the construction of the IN clause, they're actually going to be
> pessimally located at the very end of that scan order. So it ends up
> forming all of the nestloop result, when it had expected to have to
> compute only 1/595 of it. We've discussed dialing down the planner's
> optimism about limit plans to not assume perfect independence of filter
> conditions, but I don't think anyone would advocate for having it assume
> the worst possible case, which is what you've got here unfortunately.

Very bad plans with LIMIT are frequent. This is bad for us because
adding LIMIT usually/is supposed to make queries faster, not slower.

We need to do something.

LIMIT optimistically assumes the very best case, and that best case
gets better the number of rows filtered away.

If we are scanning N rows with LIMIT M we must assume that the action
relates in some way to N, rather than assuming the LIMIT is more and
more effective as N/M increases.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message gtsal 2012-10-30 10:54:53 BUG #7628: Installation of PostgreSQL 9.2.1 on Windows 7 may take too long
Previous Message Tianyin Xu 2012-10-30 05:38:46 Re: BUG #7624: Misleading Log Message & Inconsistent Configuration Design