| From: | Geoff Winkless <pgsqladmin(at)geoff(dot)dj> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Postgres General <pgsql-general(at)postgresql(dot)org> | 
| Subject: | Re: index problems (again) | 
| Date: | 2016-03-07 16:37:37 | 
| Message-ID: | CAEzk6fdjPxaVNXXH0TiqPacJDOvt7z-Fma=iXtKOrpYY5CivZg@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
On 7 March 2016 at 16:02, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> In English, what that plan is trying to do is scan the index
> in sc_id order until it hits a row with scdate in the target range.
> The first such row, by definition, has the correct min(sc_id) value.
> The problem is that we're guessing at how soon we'll hit such a row.
> If the columns are independent, then the planner can guess based on how
> many rows in the whole table have scdate in the target range, and it
> will probably be about right.  But that estimate can fall down very
> badly if sc_id and scdate increase together, because then the target
> rows aren't randomly distributed in the index sequence but could all be
> all the way at the far end of the index.
I'm sorry, I'm obviously not being clear. I already accepted this
argument when Victor gave it, although I believe that in part it falls
down because sc_id is also (potentially) randomly distributed so it's
not like you're doing a sequential table scan (it might work better on
a clustered table, but we don't have those :) )
So you still have an extra layer of indirection into a large table
with lots of random accesses.
> If we had cross-column correlation stats we could detect this pitfall,
> but without that it's hard to do.
But as far as I can see, apart from the absolute extremes, the
index-only scan is _always_ going to be quicker than the index+table
scan. It doesn't matter whether or not the distribution is random or
skewed, the index-only scan is going to be better (or approximately
equally as good). We can see that by the massive speedup I get by
using index(scid,scdate), which in all other respects is going to
suffer from exactly the same problem from that the scid-only index
suffers.
And the real advantage: at the extremes, the index-only worst-case is
minimally worse than the best case. Whereas the worst-case of the
index-scan-plus-table-compare method is horrific.
I don't believe you need any further statistics than what is currently
available to be able to make that judgement, and that's why I believe
it's suboptimal.
Geoff
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2016-03-07 16:44:38 | Re: index problems (again) | 
| Previous Message | Adrian Klaver | 2016-03-07 16:35:39 | Re: Inserting JSON via Java PreparedStatment |