Re: Yet another abort-early plan disaster on 9.3

From: Ryan Johnson <ryan(dot)johnson(at)cs(dot)utoronto(dot)ca>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Yet another abort-early plan disaster on 9.3
Date: 2014-10-02 13:59:00
Message-ID: 542D5A24.1080308@cs.utoronto.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 29/09/2014 9:00 AM, Merlin Moncure wrote:
> On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> The problem, as I see it, is different. We assume that if there are
>> 100 distinct values and you use LIMIT 1 that you would only need to
>> scan 1% of rows. We assume that the data is arranged in the table in a
>> very homogenous layout. When data is not, and it seldom is, we get
>> problems.
> Hm, good point -- 'data proximity'. At least in theory, can't this be
> measured and quantified? For example, given a number of distinct
> values, you could estimate the % of pages read (or maybe non
> sequential seeks relative to the number of pages) you'd need to read
> all instances of a particular value in the average (or perhaps the
> worst) case. One way of trying to calculate that would be to look at
> proximity of values in sampled pages (and maybe a penalty assigned for
> high update activity relative to table size). Data proximity would
> then become a cost coefficient to the benefits of LIMIT.
Latecomer to the conversation here, but it seems like this issue (unlike
some) is really easy to recognize at runtime. The optimizer assumed the
scan would access O(1) pages; if the scan has not returned enough
results after k pages, that would be a really good indication that it's
time to rethink the plan, and probably before too much work has been
done higher in the plan (esp. if there's any kind of buffering between
operators, perhaps intentionally so in special cases like this)

Not sure pgsql has any dynamic reoptimization infrastructure in place,
tho. If not, these sorts of dangerous plans are best left alone IMO.

Ryan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-10-02 14:28:48 Re: Per table autovacuum vacuum cost limit behaviour strange
Previous Message Stephen Frost 2014-10-02 13:58:11 Re: WITH CHECK and Column-Level Privileges

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2014-10-02 15:34:35 Re: auto vaccum is dying
Previous Message Peter Geoghegan 2014-10-02 09:30:02 Re: Yet another abort-early plan disaster on 9.3