Optimizer problem with multi-column index

From: Marc Schablewski <ms(at)clickware(dot)de>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Optimizer problem with multi-column index
Date: 2013-04-29 11:06:29
Message-ID: 517E5435.9060801@clickware.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,

we have an issue concerning multi-column indexes not being used by the planner.

version
------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.2.3 on x86_64-unknown-linux-gnu, compiled by gcc (SUSE Linux) 4.7.2 20130108
[gcc-4_7-branch revision 195012], 64-bit

(same has been tested on postgres 8.4 with exactly the same results)

default_statistics_target is set to 1000.

The following is a detailed description, completely sufficient to reproduce the problem.

consider the following table:

create sequence plannertest_seq;

create table plannertest
(
id bigint not null default nextval( 'plannertest_seq' ),
xyz_id bigint not null default random() * 9 + 1,
datum timestamp not null
);

with these 2 indexes:

create index plannertest_datum on plannertest( datum );
create index plannertest_xyz_id_datum on plannertest( xyz_id, datum );

we now insert data into the table using the following statement:

insert into plannertest( datum ) select datum from generate_series( '2010-01-01 00:00', now(),
interval '1 minute' );
INSERT 0 1748655

analyze plannertest;

then we issue the following select:

select datum from plannertest where xyz_id = 3 and datum >= '2012-10-25 06:00:00' and Datum <
'2013-01-27 06:00:00';

which yields the following plan:

------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using plannertest_datum on plannertest (cost=0.00..5444.83 rows=15164 width=24)
Index Cond: ((datum >= '2012-10-25 06:00:00'::timestamp without time zone) AND (datum <
'2013-01-27 06:00:00'::timestamp without time zone))
Filter: (xyz_id = 3)

so the optimizer here chooses the 1-column index on column "datum" even though the 2-column index
would be 10 times as selective.

we now drop the 1-column index.

drop index plannertest_datum;

and then issue the same statement again.

now we get the following plan:

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on plannertest (cost=630.09..12623.71 rows=15164 width=24)
Recheck Cond: ((xyz_id = 3) AND (datum >= '2012-10-25 06:00:00'::timestamp without time zone) AND
(datum < '2013-01-27 06:00:00'::timestamp without time zone))
-> Bitmap Index Scan on plannertest_xyz_id_datum (cost=0.00..626.30 rows=15164 width=0)
Index Cond: ((xyz_id = 3) AND (datum >= '2012-10-25 06:00:00'::timestamp without time zone)
AND (datum < '2013-01-27 06:00:00'::timestamp without time zone))

this is really funny. Apparently, in the first case, where the 1-column index (which does NOT
contain the column to be filtered) is used, the planner forgets about the necessity to visit every
tuple in order to do the filtering on the "xyz" column.

In the second case, where everything needed is included in the index, however, the planner thinks it
has to use an absolutely unnecessary bitmap heap scan. A simple forward index scan would do here.

BTW, exactly the same plans are output when I do a "select *".


Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message matti.aarnio 2013-04-29 11:23:55 BUG #8127: After failed insert a select to figure out what failed is rejected
Previous Message whiplash 2013-04-29 11:00:56 Re: BUG #8118: Wrong sorting text