Use of multipart index with "IN"

From: Rob Messer <rrmesser(at)yahoo(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Use of multipart index with "IN"
Date: 2003-10-23 18:18:31
Message-ID: 20031023181831.40095.qmail@web41215.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a reporting system that does regular queries on a table with a
multipart index. I am running version 7.3.4. Here is the table
definition:

Table "public.ds_rec_fld"
Column | Type | Modifiers
---------------+-------------------------+-----------
dsid | character varying(20) | not null
recid | integer | not null
field_name | character varying(20) | not null
option_tag | character varying(10) | not null
option_value | integer |
field_text | character varying(2000) |
field_type_cd | character varying(8) |
Indexes: ds_rf_ndx1 btree (recid, field_name, option_value)

Normally queries are done using recid and field_name, so Postgresql
returns rows very quickly as expected. Here is a sample explain
analyze output for a typical query:

db=> explain analyze
db-> select field_name, option_tag from ds_rec_fld where recid = 3000
and field_name = 'Q3A1';
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
Index Scan using ds_rf_ndx1 on ds_rec_fld (cost=0.00..163.09 rows=40
width=38) (actual time=0.06..0.07 rows=1 loops=1)
Index Cond: ((recid = 3001) AND (field_name = 'Q3A1'::character
varying))
Total runtime: 0.12 msec
(3 rows)

The problem comes in when we are selecting multiple field_name values
in one query. The normal SQL syntax we have been using is like this:

select field_name, option_tag from ds_rec_fld where recid = 3001 and
field_name in ('Q3A1', 'Q3A9');

This is just a simplified example, at times there can be a lot of
field_name values in one query in the "in" clause. Here postgresql
refuses to use the full index, instead doing a filter based on part of
the first recid part of index. Here is the explain analyze output:

Index Scan using ds_rf_ndx1 on ds_rec_fld (cost=0.00..30425.51
rows=80 width=38) (actual time=0.18..1.08 rows=2 loops=1)
Index Cond: (recid = 3001)
Filter: ((field_name = 'Q3A1'::character varying) OR (field_name =
'Q3A9'::character varying))
Total runtime: 1.12 msec
(4 rows)

So, 10 times longer. This is an issue because at times we are
iterating through thousands of recid values.

I did a vacuum analyze, adjusted random_page_cost, etc. all to no
avail.

I also noticed that the problem goes away when I reformat the query
like this:

select field_name, option_tag from ds_rec_fld where
(recid = 3001 and field_name = 'Q3A1') or
(recid = 3001 and field_name = 'Q3A9')

Here is the explain analyze output for this:

Index Scan using ds_rf_ndx1, ds_rf_ndx1 on ds_rec_fld
(cost=0.00..326.57 rows=80 width=38) (actual time=0.07..0.10 rows=2
loops=1)
Index Cond: (((recid = 3001) AND (field_name = 'Q3A1'::character
varying)) OR ((recid = 3001) AND (field_name = 'Q3A9'::character
varying)))
Total runtime: 0.16 msec
(3 rows)

Much better. So I have partially solved my own problem, but there are
other places that this is not this simple to fix.

Therefore, my question is, is there some way to force postgresql to use
the full index and still stick with the shorter "field_name in ('...',
'...')" syntax?

If anyone has any thoughts please let me know. Also it strikes me that
perhaps the optimizer could be tweaked to treat the first case like the
second one. Thanks in advance,

Rob

__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Vivek Khera 2003-10-23 20:53:05 Re: vacuum locking
Previous Message Bruce Momjian 2003-10-23 16:56:32 Re: RedHat Enterprise Linux ES 3 ?!?!