Re: proposal: be smarter about i/o patterns in index scan

From: "Jeffrey W(dot) Baker" <jwb(at)gghcwest(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Glen Parker <glenebob(at)nwlink(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: proposal: be smarter about i/o patterns in index scan
Date: 2004-05-19 20:07:40
Message-ID: 1084997260.19249.16.camel@heat
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2004-05-19 at 12:55, Tom Lane wrote:
> "Glen Parker" <glenebob(at)nwlink(dot)com> writes:
> > What am I missing? Why is a performance bottle neck of this magnitude not
> > on the same list of priorities as PITR, replication, and Win32?
>
> It's higher on my personal to-do list than most of those ;-). But those
> things are getting done because there are other developers with other
> priorities. I suspect also that the set of people competent to make
> this change is much smaller than the set of people able to work on the
> other points. In my mind, most of the issue is in the planner (figuring
> out what to do with an unsorted-indexscan option) and relatively few
> people have wanted to touch the planner.
>
> > Here's one answer: If you had to sort every result set, even when an index
> > could have been used, overall performance would still improve by a very
> > large margin. I'd bet money on it.
>
> For a counterexample I refer you to our standard solution for
> MAX-using-an-index:
>
> SELECT ... FROM table ORDER BY foo DESC LIMIT 1;

Yes, that would stink with unordered index result.

> which would become truly spectacularly bad without an ordered index
> scan. A more general point is that for any indexscan that returns only
> a small number of index entries (eg, any unique-key search) worrying
> about physical-order access will be wasted effort. The best you could
> hope for is not to be significantly worse than the existing code in such
> cases, and I'm unconvinced you could achieve even that.

I don't think a TID-sorted index scan would be worse than the current
unsorted scan. You will produce a list of 1, sort that, and fetch the
tuple. In the current implementation, you only omit the sort, which is
basically free.

> I can assure you that any patch that completely removes the existing
> behavior will be rejected, because there are plenty of cases where it's
> the right thing.

Okay.

> The main thing that unordered indexscan could do for us is extend the
> usefulness of indexscan plans into relatively-poor-selectivity cases
> where we presently tend to drop back to seqscans. There would still be
> a selectivity threshold above which you might as well use seqscan, but
> it ought to be higher than the fraction-of-a-percent that we currently
> see for indexscans. What is unknown, and will be unknown until someone
> tries it, is just what range of selectivity this technique might win
> for. I think such a range exists; I am not real certain that it is wide
> enough to justify a lot of effort in making the idea a reality.

I don't think selectivity is the main issue. I think the absolute size
of the result is the issue. The current implementation fairly well
garauntees one seek for each tuple of the result. Which means that the
cost to access the result is something on the order of 10ms * rows. So
for any significant result that needs to be fetched from the heap, the
seek cost is going to overwhelm all other costs quickly.

Even though my index is selective, retrieving only 200k rows of 240m row
table, the cost of the sort is justified.

-jwb

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Eugeny Balakhonov 2004-05-19 20:07:57 PostgreSQL performance in simple queries
Previous Message Joshua D. Drake 2004-05-19 19:59:48 Re: Call for 7.5 feature completion