BUG #1435: Optimizer not using index on large tables when inner joining two views

From: "Yary Hluchan" <not(dot)com(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #1435: Optimizer not using index on large tables when inner joining two views
Date: 2005-01-23 08:57:07
Message-ID: 200501230857.j0N8v71Z011375@developer.pgadmin.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 1435
Logged by: Yary Hluchan
Email address: not(dot)com(at)gmail(dot)com
PostgreSQL version: 8.0.0
Operating system: OpenBSD 3.6
Description: Optimizer not using index on large tables when inner
joining two views
Details:

There's a bit of background here, please bear with me-

The PostgreSQL optimizer is failing with some queries on the MusicBrainz
database ( http://www.musicbrainz.org/products/server/download.html - though
it will be easier to get a data dump directly from me). The base tables
involved are:

moderation_open : 6047 rows
moderation_closed : 1502267 rows
vote_open : 13811 rows
vote_closed : 2116153 rows

The only difference between the "open" and "closed" tables are the data
stored within them- other than the titles, their definitions and indexes are
the same. The vote tables have a column "moderation" which join with the
moderation tables' "id" column. The vote tables also have a column
"moderator" which represents the voter. All tables have other columns which
aren't relevant for this bug report.

CREATE TABLE moderation_<open,closed>
( id INTEGER NOT NULL,
... )

CREATE TABLE vote_<open,closed>
( id INTEGER NOT NULL,
moderator INTEGER NOT NULL, -- references moderator
moderation INTEGER NOT NULL, -- references moderation
... )

There are views, "moderation_all" and "vote_all", that combine the data in
the open & closed tables

CREATE VIEW moderation_all AS
SELECT * FROM moderation_open
UNION ALL
SELECT * FROM moderation_closed;

CREATE VIEW vote_all AS
SELECT * FROM vote_open
UNION ALL
SELECT * FROM vote_closed;

The bug appears with this query:
SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote
FROM moderation_all m INNER JOIN vote_all v ON
v.moderation = m.id AND v.moderator = 85981
AND NOT v.superseded
ORDER BY m.id DESC

I've put it and the query plan at http://yary.ack.org/qp.txt (all query
plans shown are from a newly vacuumed and analyzed pg 8.0.0 database). Query
plan 1 shows sequential scans on the moderation_* tables

If I SET ENABLE_SEQSCAN TO OFF the new plan still does sequential scans on
the moderation tables, and still takes about ten minutes wall time on my
system. See Query Plan 2.

I then set ENABLE_SEQSCAN TO ON and run a test to see if there is a problem
with the indexes on the moderation tables-
explain SELECT m.* FROM moderation_all m where m.id = 2045296;

it indeed does use the indexes, and the query returns quickly. (See "test qp
3")

I then try some rewrites of the query that don't use the moderation_all
view. The rewrites return identical data as the original and run very
quickly, and don't have optimizer plans using sequential scans on the base
tables. I've tried to make them identical to the original query.

rewrite 1-
SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote
FROM moderation_open m INNER JOIN vote_all v ON
v.moderation = m.id AND v.moderator = 85981
AND NOT v.superseded
UNION ALL
SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote
FROM moderation_closed m INNER JOIN vote_all v ON
v.moderation = m.id AND v.moderator = 85981
AND NOT v.superseded
ORDER BY id DESC

My reading of it is that the optimizer isn't applying a distributive rule:
condition(a union b) == condition(a) union condition(b). It knows the
subquery is going to be compared with m.id, but it doesn't seem to try
moving that comparison into the component subqueries. Or maybe there's some
re-arrangement of the query tree it's just plain missing. I'll limit my
speculation.

I can provide any data dumps or run any tests you request.

Thanks for looking

-y

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2005-01-23 16:01:14 Re: inconsitent tab completion in psql
Previous Message Michael Fuhr 2005-01-23 03:54:12 Re: 8.0.0 pg_restore -L doesn't restore ACLs