Re: Yet another abort-early plan disaster on 9.3

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Yet another abort-early plan disaster on 9.3
Date: 2014-09-29 15:00:26
Message-ID: CAHyXU0wFAKiwgybbEAX=597-MxtPGdf0bmQvRSJq9db1KTZ6nA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> The problem, as I see it, is different. We assume that if there are
> 100 distinct values and you use LIMIT 1 that you would only need to
> scan 1% of rows. We assume that the data is arranged in the table in a
> very homogenous layout. When data is not, and it seldom is, we get
> problems.

Hm, good point -- 'data proximity'. At least in theory, can't this be
measured and quantified? For example, given a number of distinct
values, you could estimate the % of pages read (or maybe non
sequential seeks relative to the number of pages) you'd need to read
all instances of a particular value in the average (or perhaps the
worst) case. One way of trying to calculate that would be to look at
proximity of values in sampled pages (and maybe a penalty assigned for
high update activity relative to table size). Data proximity would
then become a cost coefficient to the benefits of LIMIT.

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-09-29 15:02:49 Re: [REVIEW] Re: Compression of full-page-writes
Previous Message Robert Haas 2014-09-29 14:59:20 Re: Replication identifiers, take 3

Browse pgsql-performance by date

  From Date Subject
Next Message Simon Riggs 2014-09-29 20:53:10 Re: Yet another abort-early plan disaster on 9.3
Previous Message Graeme B. Bell 2014-09-29 11:08:22 Re: Very slow postgreSQL 9.3.4 query