Skip site navigation (1) Skip section navigation (2)

Outer joins and equivalence

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Outer joins and equivalence
Date: 2008-05-27 19:59:18
Message-ID: 1211918358.4489.257.camel@ebony.site (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
I have a complex query where making a small change to the SQL increases
run-time by > 1000 times.

The first SQL statement is of the form

A JOIN B ON (a.id = b.id) LEFT JOIN C ON (a.id = c.id) 

and the second is like this

A JOIN B ON (a.id = b.id) LEFT JOIN C ON (b.id = c.id)

the only difference is the substitution of a -> b

This has been verified by examining EXPLAIN of SQL1, SQL2 and SQL1. The
first and third EXPLAINs are equivalent. All ANALYZE etc has been run.
All relevant join columns are INTEGERs. So we have a repeatable
difference in plans attributable to a single change.

The difference in run time occurs because the second form of the query
uses a SeqScan of a large table, whereas the first form is able to use a
nested loops join to access the large table, which then allows it to
access just 3 rows rather than 85 million rows.

There is a clear equivalence between the two forms of SQL, since the
equivalence a = b is derived from a natural rather than an outer join.
This can be applied from the left side to the right side of the join. 

So this looks to me like either a bug or just an un-implemented
optimizer feature. The code I've just looked at for equivalent class
relationships appears to refer to using this to propagate constant info
only, so I'm thinking it is not a bug. and hence why it is reported here
and not to pgsql-bugs.

I do recognise that we would *not* be able to deduce this form of SQL

A JOIN B ON (a.id = c.id) LEFT JOIN C ON (b.id = c.id)

though that restriction on outer join equivalence is not relevant here.

(SQL, EXPLAINs etc available off-list only, by request).

I'm looking into this more now.

-- 
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


Responses

pgsql-performance by date

Next:From: Simon RiggsDate: 2008-05-27 20:54:26
Subject: Re: I/O on select count(*)
Previous:From: hubert depesz lubaczewskiDate: 2008-05-27 14:58:19
Subject: Re: [GENERAL] select query takes 13 seconds to run with index

pgsql-hackers by date

Next:From: Peter EisentrautDate: 2008-05-27 20:49:13
Subject: Re: ERRORDATA_STACK_SIZE panic crashes on Windows
Previous:From: Simon RiggsDate: 2008-05-27 19:35:51
Subject: Hint Bits and Write I/O

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group