Re: Yet another abort-early plan disaster on 9.3

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Yet another abort-early plan disaster on 9.3
Date: 2014-12-05 06:46:15
Message-ID: CA+U5nMKiLkryw=Xq3-M4rWGF+rmxMfDy0DMeViv_JmMxdERgyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 30 September 2014 at 05:53, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 29 September 2014 16:00, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> The problem, as I see it, is different. We assume that if there are
>>> 100 distinct values and you use LIMIT 1 that you would only need to
>>> scan 1% of rows. We assume that the data is arranged in the table in a
>>> very homogenous layout. When data is not, and it seldom is, we get
>>> problems.
>>
>> Hm, good point -- 'data proximity'. At least in theory, can't this be
>> measured and quantified? For example, given a number of distinct
>> values, you could estimate the % of pages read (or maybe non
>> sequential seeks relative to the number of pages) you'd need to read
>> all instances of a particular value in the average (or perhaps the
>> worst) case. One way of trying to calculate that would be to look at
>> proximity of values in sampled pages (and maybe a penalty assigned for
>> high update activity relative to table size). Data proximity would
>> then become a cost coefficient to the benefits of LIMIT.
>
> The necessary first step to this is to realise that we can't simply
> apply the LIMIT as a reduction in query cost, in all cases.
>
> The way I'm seeing it, you can't assume the LIMIT will apply to any
> IndexScan that doesn't have an index condition. If it has just a
> filter, or nothing at all, just an ordering then it could easily scan
> the whole index if the stats are wrong.
>
> So plans like this could be wrong, by assuming the scan will end
> earlier because of the LIMIT than it actually will.
>
> Limit
> IndexScan (no index cond)
>
> Limit
> NestJoin
> IndexScan (no index cond)
> SomeScan
>
> Limit
> NestJoin
> NestJoin
> IndexScan (no index cond)
> SomeScan
> SomeScan
>
> and deeper...
>
> I'm looking for a way to identify and exclude such plans, assuming
> that this captures at least some of the problem plans.

After looking at this for some time I now have a patch that solves this.

It relies on the observation that index scans with no bounded quals
don't play nicely with LIMIT. The solution relies upon the point that
LIMIT does not reduce the startup cost of plans, only the total cost.
So we can solve the problem by keeping the total cost estimate, just
move some of that into startup cost so LIMIT does not reduce costs as
much as before.

It's a simple patch, but it solves the test cases I know about and
does almost nothing to planning time.

I tried much less subtle approaches involving direct prevention of
LIMIT pushdown but the code was much too complex for my liking.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
avoid_limit_pushdown.v3.patch application/octet-stream 1.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rahila Syed 2014-12-05 06:49:25 Re: [REVIEW] Re: Compression of full-page-writes
Previous Message Amit Kapila 2014-12-05 06:37:00 Re: On partitioning

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2014-12-05 15:45:23 Re: Yet another abort-early plan disaster on 9.3
Previous Message Tom Lane 2014-12-05 01:32:50 Re: Query doesn't use index on hstore column