Re: Yet another abort-early plan disaster on 9.3

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Yet another abort-early plan disaster on 9.3
Date: 2014-12-05 15:45:23
Message-ID: CAHyXU0yjuCwH47qv=Lnx3BaSu69CfdKadc11oveTcB9T4K0hGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Fri, Dec 5, 2014 at 12:46 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 30 September 2014 at 05:53, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On 29 September 2014 16:00, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>>> On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>>> The problem, as I see it, is different. We assume that if there are
>>>> 100 distinct values and you use LIMIT 1 that you would only need to
>>>> scan 1% of rows. We assume that the data is arranged in the table in a
>>>> very homogenous layout. When data is not, and it seldom is, we get
>>>> problems.
>>>
>>> Hm, good point -- 'data proximity'. At least in theory, can't this be
>>> measured and quantified? For example, given a number of distinct
>>> values, you could estimate the % of pages read (or maybe non
>>> sequential seeks relative to the number of pages) you'd need to read
>>> all instances of a particular value in the average (or perhaps the
>>> worst) case. One way of trying to calculate that would be to look at
>>> proximity of values in sampled pages (and maybe a penalty assigned for
>>> high update activity relative to table size). Data proximity would
>>> then become a cost coefficient to the benefits of LIMIT.
>>
>> The necessary first step to this is to realise that we can't simply
>> apply the LIMIT as a reduction in query cost, in all cases.
>>
>> The way I'm seeing it, you can't assume the LIMIT will apply to any
>> IndexScan that doesn't have an index condition. If it has just a
>> filter, or nothing at all, just an ordering then it could easily scan
>> the whole index if the stats are wrong.
>>
>> So plans like this could be wrong, by assuming the scan will end
>> earlier because of the LIMIT than it actually will.
>>
>> Limit
>> IndexScan (no index cond)
>>
>> Limit
>> NestJoin
>> IndexScan (no index cond)
>> SomeScan
>>
>> Limit
>> NestJoin
>> NestJoin
>> IndexScan (no index cond)
>> SomeScan
>> SomeScan
>>
>> and deeper...
>>
>> I'm looking for a way to identify and exclude such plans, assuming
>> that this captures at least some of the problem plans.
>
> After looking at this for some time I now have a patch that solves this.
>
> It relies on the observation that index scans with no bounded quals
> don't play nicely with LIMIT. The solution relies upon the point that
> LIMIT does not reduce the startup cost of plans, only the total cost.
> So we can solve the problem by keeping the total cost estimate, just
> move some of that into startup cost so LIMIT does not reduce costs as
> much as before.
>
> It's a simple patch, but it solves the test cases I know about and
> does almost nothing to planning time.
>
> I tried much less subtle approaches involving direct prevention of
> LIMIT pushdown but the code was much too complex for my liking.

Neat -- got any test cases (would this have prevented OP's problem)?

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-12-05 15:49:37 Re: Add shutdown_at_recovery_target option to recovery.conf
Previous Message Merlin Moncure 2014-12-05 15:43:30 Re: postgres_fdw does not see enums

Browse pgsql-performance by date

  From Date Subject
Next Message Simon Riggs 2014-12-05 16:04:59 Re: Yet another abort-early plan disaster on 9.3
Previous Message Simon Riggs 2014-12-05 06:46:15 Re: Yet another abort-early plan disaster on 9.3