bug in query planning?

From: Steven D(dot)Arnold <stevena(at)neosynapse(dot)net>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: bug in query planning?
Date: 2003-12-22 02:00:17
Message-ID: 971FAF7E-3422-11D8-B5C3-000A95BA4396@neosynapse.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have a query which does not use column indexes that it should use. I
have discovered some interesting behaviors of Postgres which may
indicate a bug in the database's query planning.

Take a look at the query below. There is a btree index on both
m.account_id and a.account_id. Query (1) does not use the index on the
messages table, instead opting for a full table scan, thus killing
performance. The messages table can contain potentially hundreds of
thousands or millions of rows. Even at 50,000, it's murder.

Query (2) below is the same query, but we reverse the order of the
tables. It's obviously not quite the same query semantically, even
though in my case it should always produce the same result. It is
interesting to note that it uses the indexes tho.

Finally, query (3) below uses traditional joining (non-ANSI). Indexes
are correctly used in that query. The suggestion is that Postgres does
not correctly analyze queries using ANSI joins. Indexes are
occasionally skipped when they should be used. This seems like a bug
in Postgres. I'm using version 7.3.4 of Postgres.

Thanks in advance for any comments...
steve

Query (1)
=========
defender=# explain analyze
defender-# select count(message_id)
defender-# from messages m
defender-# left join accounts a
defender-# on m.account_id::bigint = a.account_id::bigint
defender-# where a.email = 'stevena(at)neosynapse(dot)net';
QUERY
PLAN
------------------------------------------------------------------------
--------------------------------------------------------------------
Aggregate (cost=20461.10..20461.10 rows=1 width=47) (actual
time=1420.09..1420.09 rows=1 loops=1)
-> Hash Join (cost=30.77..20334.38 rows=50687 width=47) (actual
time=0.51..1319.69 rows=51419 loops=1)
Hash Cond: ("outer".account_id = "inner".account_id)
Filter: ("inner".email = 'stevena(at)neosynapse(dot)net'::text)
-> Seq Scan on messages m (cost=0.00..19289.87 rows=50687
width=16) (actual time=0.06..703.89 rows=52541 loops=1)
-> Hash (cost=30.76..30.76 rows=3 width=31) (actual
time=0.40..0.40 rows=0 loops=1)
-> Index Scan using accounts_pkey on accounts a
(cost=0.00..30.76 rows=3 width=31) (actual time=0.17..0.38 rows=3
loops=1)
Total runtime: 1420.25 msec
(8 rows)

Query (2)
=========
defender=# explain analyze
defender-# select count(message_id)
defender-# from accounts a
defender-# left join messages m
defender-# on a.account_id::bigint = m.account_id::bigint
defender-# where a.email = 'stevena(at)neosynapse(dot)net';

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
--------
Aggregate (cost=6806.54..6806.54 rows=1 width=24) (actual
time=792.14..792.14 rows=1 loops=1)
-> Nested Loop (cost=0.00..6764.30 rows=16896 width=24) (actual
time=0.38..718.12 rows=51419 loops=1)
-> Index Scan using accounts_email on accounts a
(cost=0.00..8.98 rows=1 width=8) (actual time=0.22..0.25 rows=1
loops=1)
Index Cond: (email = 'stevena(at)neosynapse(dot)net'::text)
-> Index Scan using messages_account_id on messages m
(cost=0.00..6544.13 rows=16896 width=16) (actual time=0.15..593.15
rows=51419 loops=1)
Index Cond: ("outer".account_id = m.account_id)
Total runtime: 792.33 msec
(7 rows)

Query (3)
=========
defender=# explain analyze
defender-# select count(message_id)
defender-# from messages m, accounts a
defender-# where m.account_id::bigint = a.account_id::bigint
defender-# and a.email = 'stevena(at)neosynapse(dot)net';

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
--------
Aggregate (cost=6806.54..6806.54 rows=1 width=24) (actual
time=782.30..782.30 rows=1 loops=1)
-> Nested Loop (cost=0.00..6764.30 rows=16896 width=24) (actual
time=0.33..708.52 rows=51422 loops=1)
-> Index Scan using accounts_email on accounts a
(cost=0.00..8.98 rows=1 width=8) (actual time=0.15..0.18 rows=1
loops=1)
Index Cond: (email = 'stevena(at)neosynapse(dot)net'::text)
-> Index Scan using messages_account_id on messages m
(cost=0.00..6544.13 rows=16896 width=16) (actual time=0.15..578.23
rows=51422 loops=1)
Index Cond: (m.account_id = "outer".account_id)
Total runtime: 782.46 msec
(7 rows)

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2003-12-22 04:47:47 Re: bug in query planning?
Previous Message Andrew Dunstan 2003-12-22 01:45:11 Re: [pgsql-advocacy] PostgreSQL speakers needed for OSCON