| From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
|---|---|
| To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
| Cc: | Josh Berkus <josh(at)agliodbs(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org> |
| Subject: | Re: Yet another abort-early plan disaster on 9.3 |
| Date: | 2014-09-29 15:00:26 |
| Message-ID: | CAHyXU0wFAKiwgybbEAX=597-MxtPGdf0bmQvRSJq9db1KTZ6nA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers pgsql-performance |
On Fri, Sep 26, 2014 at 3:06 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> The problem, as I see it, is different. We assume that if there are
> 100 distinct values and you use LIMIT 1 that you would only need to
> scan 1% of rows. We assume that the data is arranged in the table in a
> very homogenous layout. When data is not, and it seldom is, we get
> problems.
Hm, good point -- 'data proximity'. At least in theory, can't this be
measured and quantified? For example, given a number of distinct
values, you could estimate the % of pages read (or maybe non
sequential seeks relative to the number of pages) you'd need to read
all instances of a particular value in the average (or perhaps the
worst) case. One way of trying to calculate that would be to look at
proximity of values in sampled pages (and maybe a penalty assigned for
high update activity relative to table size). Data proximity would
then become a cost coefficient to the benefits of LIMIT.
merlin
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Robert Haas | 2014-09-29 15:02:49 | Re: [REVIEW] Re: Compression of full-page-writes |
| Previous Message | Robert Haas | 2014-09-29 14:59:20 | Re: Replication identifiers, take 3 |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Simon Riggs | 2014-09-29 20:53:10 | Re: Yet another abort-early plan disaster on 9.3 |
| Previous Message | Graeme B. Bell | 2014-09-29 11:08:22 | Re: Very slow postgreSQL 9.3.4 query |