Re: Strange (?) Index behavior?

From: Allen Landsidel <alandsidel(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Strange (?) Index behavior?
Date: 2004-11-05 21:02:43
Message-ID: 88f1825a04110513028759dd3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, 5 Nov 2004 18:34:23 -0000, Matt Clark <matt(at)ymogen(dot)net> wrote:
> > With that many rows, and a normal index on the field,
> > postgres figures the best option for say "I%" is not an index
> > scan, but a sequential scan on the table, with a filter --
> > quite obviously this is slow as heck, and yes, I've run
> > analyze several times and in fact have the vacuum analyze automated.
>
> Ah, so "like 'I%'" uses a very slow seq scan, but "like 'ABC%'" uses an
> ordinary index OK? If so then...

That is correct.

> The planner would usually assume (from what Tom usually says) that 1/26
> selectivity isn't worth doing an index scan for, but in your case it's wrong
> (maybe because the rows are very big?)

The rows aren't big, it's a text field, a few ints, and a few
timestamps. That's all. The text field is the one we're querying on
here and lengthwise it's typically not over 32 chars.

> You may be able to get the planner to go for an index scan on "like 'I%'" by
> tweaking the foo_cost variables in postgresql.conf

That's true but I'd rather not, there are times when the seqscan will
have a faster net result (for other queries) and I'd rather not have
them suffer.

> Or you could have the app rewrite "like 'I%'" to "like 'IA%' or like 'IB%'
> ... ", or do that as a stored proc.

Holy cow. Yeah that seems a little outrageous. It would be cleaner
looking in "\d table" than having all these indexes at the cost of
having one very ugly query.

> > With the partial index the index scan is used and the cost
> > drops from 0..2million to 0..9000 -- a vast improvement.
>
> So there are really only 9000 rows out of 76 million starting with 'I'? How
> about combining some techniques - you could create an index on the first two
> chars of the field (should be selective enough to give an index scan),
> select from that, and select the actual data with the like clause.

I was talking about the cost, not the number of rows. About 74,000
rows are returned but the query only takes about 8 seconds to run. --
with the partial index in place.

> CREATE INDEX idx_firstletters ON table (substr(field, 1, 2));
> CREATE INDEX idx_all ON table (field);
> SELECT field FROM (SELECT field FROM table WHERE substr(field, 1, 2) = 'DE')
> AS approx WHERE field LIKE 'DE%';

That looks like a pretty slick way to create an index, I didn't know
there was such a way to do it.. but It appears that this will not work
with queries where the WHERE clause wants to find substrings longer
than 2 characters.

I will give it a try and see how it goes though I think I'm fairly
"settled" on creating all the other indexes, unless there is some
specific reason I shouldn't -- they are used in all cases where the
substring is >= 1 character, so long as I make sure the first where
clause (or inner select in a subquery) is the most ambiguous from an
index standpoint.

Going back to the initial problem -- having only one large, complete
index on the table (no partial indexes) the query "SELECT field FROM
table WHERE field LIKE 'A%';" does not use the index. The query
"SELECT field FROM table WHERE field LIKE 'AB%';" however, does use
the single large index if it exists.

Adding the partial index "CREATE INDEX idx_table_substrfield_A ON
table (field) WHERE field LIKE 'A%';" causes all queries with
substrings of any length to do index scans.provided I issue the query
as:

SELECT field FROM table WHERE field LIKE 'A%' AND field LIKE 'AB%';
-- or even --
SELECT field FROM table WHERE field LIKE 'A%';

The latter query, without the partial index described, does a
sequential scan on the table itself instead of an index scan.

-Allen

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2004-11-05 21:08:56 Re: Strange (?) Index behavior?
Previous Message Gabriele Bartolini 2004-11-05 21:00:16 Question regarding the file system