Possible bug in PostgreSQL query planner

From: Matthew Wakeling <matthew(at)flymine(dot)org>
To: pgsql-bugs(at)postgresql(dot)org
Cc: matthew(at)flymine(dot)org, <andy(at)flymine(dot)org>
Subject: Possible bug in PostgreSQL query planner
Date: 2002-10-04 16:47:16
Message-ID: Pine.LNX.4.44.0210041521180.1333-300000@aragorn.flymine.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


I have encountered what seems to be a bad decision made by the query
planner, on a quite simple query. I have attached a shell script which
creates a large (180MB) sql script, to demonstrate the problem. To
reproduce, execute the bash script, and pipe the stdout into psql.
Redirect the stdout of psql somewhere, as it isn't very interesting. psql
produces several EXPLAIN ANALYSE results on its stderr. I have also
attached a copy of the results I get.

The test creates three tables:
jointest3_1 (keya int, infoa int)
jointest3_2 (keya int, keyb int)
jointest3_3 (keyb int, infob int)

It then fills tables jointest3_1 and jointest3_3 each with 1 million rows,
where the key value is sequential (0 to 999999), and the info value is a
random zero or positive integer below 10000.

It then fills table jointest3_2 with 2 million rows, each with random
6-digit integers in both of its columns.

Therefore, table jointest3_2 represents a many-to-many mapping between
jointest3_1 and jointest3_3, and the natural join of all three tables
should have 2 million rows.

The test then creates lots of indexes, clusters on the indexes, and does a
vacuum analyse.

The test does six EXPLAIN ANALYSE commands. There are actually three
distinct commands, each of which is performed twice, to get a
before-cached and after-cached performance indication. The first command
does a natural join of all three tables, and restricts the results on
jointest3_1.infoa. This is performed quickly, as there is an index on
infoa which returns approximately 100 rows.

The second distinct command also does a natural join of all three tables,
but restricts the results by jointest3_3.infob. The query planner appears
to make a bad decision, and the query takes a long time to return.

The third distinct command does the same query as the second distinct
command, but it forces the query planner to join tables jointest3_2 and
jointest3_3 together before joining with jointest3_1 by doing a subselect.
This performs as quickly as the first distinct command.

I would have expected the query planner to find the fast query plan, given
there are only three tables being joined together.

I am using PostgreSQL version 7.2.1, running on Debian unstable/testing,
reasonably up to date. If you need more details, please email back.
select version(); gives the following:
PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.95.4
PostgreSQL was installed as a Debian package.

The machine is a Athlon XP1900, with 512MB of RAM, a 40GB hard drive.
Kernel: Linux 2.4.19-686
libc6: 2.2.5-14.1

Matthew Wakeling

Attachment Content-Type Size
insert1M_jointest3 text/plain 2.4 KB
insert1M_jointest3_results2 text/plain 4.5 KB

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2002-10-04 17:54:30 Re: 7.3B2 initdb fails
Previous Message Bruce Momjian 2002-10-04 16:05:04 Re: Bug in Function-Transactions?