Re: visualizing B-tree index coverage

From: PFC <lists(at)boutiquenumerique(dot)com>
To: tjo(at)acm(dot)org
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: visualizing B-tree index coverage
Date: 2005-01-27 16:40:25
Message-ID: opsk9slnxrth1vuj@musicbox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


> I realize that using OR will not result in an index scan.
> I will never be interested in a OR condition for the kinds
> of searches I use. In my Select statements, I always name
> every column of the multi-column index in same order that
> they were named when creating the index. I always use
> the >= condition, and very rarely, the = condition.

All the leftmost index column must be named, but the order is unimportant.
You can use (a BETWEEN x AND y) instead of (a>=x AND a<=y), it is cleaner.

> However, I am concerned that I must place
> the most selective column first in my index. I cannot tell,
> a priori, which column will be most selective. That depends on the
> nature of search, which can vary widely each time.
> Are you saying that if my first column is not selective, even though the
> remaining
> columns are, the planner may choose not to use the index after
> seeing that the first column is not very selective?

I thought this was true but made some tests and the index scanner is
smart.

Try this :
CREATE TABLE test (id serial primary key, a INTEGER, z INTEGER, e INTEGER,
r INTEGER, t INTEGER, y INTEGER ) WITHOUT OIDS;
INSERT 1M rows into table using a plpgsql function, with a,z,e,r,t,y being
floor(random()*10) for instance.

Then you can try various selects. a,z,e,r,t,y are a linear distribution
between 0 and 9 included, so :
a>=A AND z>=Z ... y>=Y gives a result set of about
(10-A)*(10-Z)*...*(10-Y) results. You'll see the planner will use an index
scan when needed. You can try the easiest case (a>=9) which just explores
one part of the tree, and the worst case which explores a part of all
leafs (y>=9). Both should yield about the same number of results, but the
first should be faster. To know how much, just try ;)

> That seems like an oversight, IMHO. Shouldn't the overall effect of
> using all the columns be considered before choosing not to use an
> index scan?

I think it is. There are no cross column correlation stats though.

> Since I'm using every column of my multi-column index for every search,
> and I always use >=, Explain Analyze always shows that every column
> is considered in the index scan. However, that is only when the
> index scan is used. Sometimes, Explain Analyze shows it is not used.
> That appears to happen when my search condition is very general.
> This it to be expected, so I am not worried. Most of my searches will
> be intermediate, namely not VERY selective, but also not VERY general.
> So the idea of the multi-column index is to "characterize" each row
> sufficiently, even when it is a perfectly ordinary row with no ONE
> feature being distinctive, but rather several features together giving
> it it's distinctive character. That is my interpretation of the
> multi-column index.

If you have some features which are highly selective, you can create a
single column index on them. It won't be used often, but when it will, it
will really work.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message MargaretGillon 2005-01-27 16:47:40 Re: pg_dump shell script with ~/.pgpass
Previous Message Greg Stark 2005-01-27 16:35:40 Re: visualizing B-tree index coverage