Re: Inconsistant SQL results - Suspected error with query planing or query optimisation.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: adam terrey <a(dot)terrey(at)mackillop(dot)acu(dot)edu(dot)au>
Cc: pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Inconsistant SQL results - Suspected error with query planing or query optimisation.
Date: 2007-05-22 15:53:06
Message-ID: 13885.1179849186@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

adam terrey <a(dot)terrey(at)mackillop(dot)acu(dot)edu(dot)au> writes at
http://archives.postgresql.org/pgsql-bugs/2007-05/msg00187.php

> Anyway, mybug: I have a test SELECT statement (Listing A - see "sql
> listings.txt") wich produces different results under two simular setups
> (Listing B and Listing C). Each setup should product the same result for
> the given SELECT statement.

The problem here is that 8.2 is incorrectly concluding that it can
rearrange the order of the two LEFT JOIN steps in the query:

SELECT a.id
FROM items a
LEFT JOIN (
SELECT b.id
FROM items b
LEFT JOIN (
SELECT c.id FROM items c WHERE number = 1
) AS moded_items USING (id)
WHERE moded_items.id IS NULL
) AS sub_items USING (id)
WHERE sub_items.id IS NULL;

The plan it comes up with is:

Nested Loop Left Join (cost=469.00..1063.39 rows=1 width=4) (actual time=288.962..288.962 rows=0 loops=1)
Filter: (c.id IS NULL)
-> Hash Left Join (cost=469.00..1063.00 rows=1 width=8) (actual time=288.946..288.946 rows=0 loops=1)
Hash Cond: (a.id = b.id)
Filter: (b.id IS NULL)
-> Seq Scan on items a (cost=0.00..344.00 rows=10000 width=4) (actual time=0.080..50.973 rows=10000 loops=1)
-> Hash (cost=344.00..344.00 rows=10000 width=4) (actual time=140.880..140.880 rows=10000 loops=1)
-> Seq Scan on items b (cost=0.00..344.00 rows=10000 width=4) (actual time=0.046..69.395 rows=10000 loops=1)
-> Index Scan using items_pkey on items c (cost=0.00..0.38 rows=1 width=4) (never executed)
Index Cond: (b.id = c.id)
Filter: (c.number = 1)

After reducing join_collapse_limit to 1, we get the right join order and
the right answers:

Hash Left Join (cost=750.54..1132.05 rows=1 width=4) (actual time=409.712..409.740 rows=2 loops=1)
Hash Cond: (a.id = b.id)
Filter: (b.id IS NULL)
-> Seq Scan on items a (cost=0.00..344.00 rows=10000 width=4) (actual time=0.100..51.052 rows=10000 loops=1)
-> Hash (cost=750.52..750.52 rows=1 width=4) (actual time=264.978..264.978 rows=9998 loops=1)
-> Hash Left Join (cost=369.01..750.52 rows=1 width=4) (actual time=30.074..192.023 rows=9998 loops=1)
Hash Cond: (b.id = c.id)
Filter: (c.id IS NULL)
-> Seq Scan on items b (cost=0.00..344.00 rows=10000 width=4) (actual time=0.030..50.913 rows=10000 loops=1)
-> Hash (cost=369.00..369.00 rows=1 width=4) (actual time=29.976..29.976 rows=2 loops=1)
-> Seq Scan on items c (cost=0.00..369.00 rows=1 width=4) (actual time=29.896..29.916 rows=2 loops=1)
Filter: (number = 1)

So there is something wrong with the rule used for deciding whether two
LEFT JOINs can commute. Per the planner README:

: The planner's treatment of outer join reordering is based on the following
: identities:
:
: 1. (A leftjoin B on (Pab)) innerjoin C on (Pac)
: = (A innerjoin C on (Pac)) leftjoin B on (Pab)
:
: where Pac is a predicate referencing A and C, etc (in this case, clearly
: Pac cannot reference B, or the transformation is nonsensical).
:
: 2. (A leftjoin B on (Pab)) leftjoin C on (Pac)
: = (A leftjoin C on (Pac)) leftjoin B on (Pab)
:
: 3. (A leftjoin B on (Pab)) leftjoin C on (Pbc)
: = A leftjoin (B leftjoin C on (Pbc)) on (Pab)
:
: Identity 3 only holds if predicate Pbc must fail for all-null B rows
: (that is, Pbc is strict for at least one column of B). If Pbc is not
: strict, the first form might produce some rows with nonnull C columns
: where the second form would make those entries null.

What we have here is an invocation of rule 3 in a situation where it's
not appropriate. The difficulty is that the code is only paying
attention to the syntactical JOIN/ON clauses and has neglected the
intermediate-level WHERE clause.

After a bit of reflection it seems that a WHERE that is semantically
just below a left-join's right side can be treated as if it were part of
that left-join's ON clause. It will have the same effect as if it had
been written there: any rows rejected by the WHERE will fail to be
joined to the left side and will contribute nothing to the result.
Had we been following this rule, we'd have concluded that c.id IS NULL
is part of the upper join qual, and therefore that it has a predicate
Pabc not just Pab and cannot be commuted with the lower join.

Teaching initsplan.c to do things this way seems possible but less than
trivial. Before I start worrying about that, does anyone see any flaws
in the reasoning at this level of detail?

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Phil Frost 2007-05-22 16:47:57 Re: BUG #3297: psql won't open
Previous Message Alvaro Herrera 2007-05-22 15:12:26 Re: Inconsistant SQL results - Suspected error with query planing or query optimisation.

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2007-05-22 15:58:33 like/ilike improvements
Previous Message Alvaro Herrera 2007-05-22 15:48:38 Re: CREATE TABLE LIKE INCLUDING INDEXES support