Re: select distinct and index usage

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "David Wilson" <david(dot)t(dot)wilson(at)gmail(dot)com>, "Alban Hertroys" <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: select distinct and index usage
Date: 2008-04-08 11:37:29
Message-ID: 878wzo60hy.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Tom Lane escribió:
>>> What I think you'll find, though, is that once you do force an indexscan
>>> to be picked it'll be slower. Full-table index scans are typically
>>> worse than seqscan+sort, unintuitive though that may sound.

The original poster's implicit expectation is that an index scan would be
faster because it shouldn't have to visit every tuple. Once it's found a tuple
with a particular value it should be able to use the index to skip to the next
key value.

I thought our DISTINCT index scan does do that but it still has to read the
index leaf pages sequentially. It doesn't back-track up the tree structure and
refind the next key.

>> Hmm, should we switch the CLUSTER code to do that?
>
> It's been suggested before, but I'm not sure. The case where an
> indexscan can win is where the table is roughly in index order already.
> So if you think about periodic CLUSTER to maintain table ordering,
> I suspect you'd want the indexscan implementation for all but maybe
> the first time.

I think we would push a query through the planner to choose the best plan
based on the statistics. I'm not sure how this would play with the visibility
rules -- iirc not all scan types can be used with all visibility modes. And
also I'm not sure how Heikki's MVCC-safe cluster would work if it's not sure
what order it's scanning the heap.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sim Zacks 2008-04-08 11:39:32 dirty select
Previous Message Mikko Partio 2008-04-08 11:05:06 Re: "too many trigger records found for relation xyz"