Skip site navigation (1) Skip section navigation (2)

Re: Sequential Scans On Complex Query With UNION - see why this fails

From: Michael Glaesemann <grzm(at)myrealbox(dot)com>
To: Steve Tucknott <steve(at)retsol(dot)co(dot)uk>
Cc: PostGreSQL <pgsql-novice(at)postgresql(dot)org>
Subject: Re: Sequential Scans On Complex Query With UNION - see why this fails
Date: 2006-01-11 23:37:35
Message-ID: 11F9E71F-F6CC-455B-ABC0-9961347410BC@myrealbox.com (view raw or flat)
Thread:
Lists: pgsql-novice
Steve,

I don't have any explanation as to why your query is using an index  
in one case and a sequential scan in another. However, I do have a  
few observations after looking at your code that you might find helpful.

On Jan 2, 2006, at 3:10 , Steve Tucknott wrote:

> I am obviously doing something wrong in the structure of the query  
> - any ideas?

When I first saw your FROM clause, I didn't even know that moving the  
ON clause away from the JOIN it was related to would even work. I was  
surprised to find that it does! I find it quite counterintuitive to  
separate a join and its condition. As I understand it,

select *
from a
join b
join c on c.b_name = b.b_name
on b.a_name = a.a_name;

is just another way of writing

select *
from a
join b on b.a_name = a.a_name
join c on c.b_name = b.b_name;

and doesn't make a difference in query planning or performance. It's  
definitely not 'hierarchical' as I understand it. Comparing EXPLAIN  
ANALYZE output for the two query forms will show you if this is the  
case.

In the queries you attached, I saw that you mix restrictions (e.g.,  
c.b_name = 'some value') and join conditions (e.g., c.b_name =  
b.b_name) in both the FROM clause and the WHERE clause. I personally  
find this quite hard to read, though I don't think it makes much  
difference to the server. I believe both join conditions and  
restrictions are rewritten as they'd appear in the WHERE clause, so  
you *could* put them all in the WHERE clause.

However, I find it useful to separate them out, and build queries  
using JOIN ... ON (and actually I prefer the JOIN ... USING syntax  
because only one set of join columns appears in the result). This  
helps me clearly see how the tables are related. I then place all  
restrictions in the WHERE clause. This helps me know where to look in  
the query to find what I'm looking for. The optimizer should be able  
to perform restrictions as necessary—not necessarily in the order the  
original query was written. After all, that's the reason for its  
existence :).

I also noticed was that you've used column numbers rather than column  
names in your ORDER BY clause. I recommend using column names, as it  
makes it easier to really see what you're ordering by without having  
to refer to the SELECT target list. (In relational theory attribute  
order does not matter. The SQL standard *does* refer to column order,  
but you can go a long way and not have to worry about it, other than  
for UNION compatibility, and even then, you don't need to use column  
numbers explicitly.)

I've provided a rewritten query at the bottom of this post. I'd be  
interested to see if it performs any differently that your original  
query (e.g., how their EXPLAIN ANALYZE output compares).

The output also includes quite a few columns with duplicate column  
names (e.g., clnt.name, clntB.name; srcD.recNo, clnt.recNo,  
clntB.recNo, srcU.recNo). If feasible, I'd use AS to provide unique  
column names, as this can make it easier to reference the columns  
either in your client application or if you use this query as a  
subquery in another statement. (I haven't done this below. There  
could be a number of additional duplicates lurking in sProd.*,  
prod.*, VAT.*, and mrkUP.* .)

What does the 1 = 1 do in your WHERE clause?

Also, the mixed case column and table names you use get lowered in  
PostgreSQL, but you've probably already noticed that. To maintain  
case, you need to double-quote these identifiers.

> Also,...as a quick 'method' question..when writing embedded joins,  
> which syntax is easier for the optimiser? Should you do:

EXPLAIN ANALYZE will help you determine which form is more efficient  
in your situation. Though as I've indicated above, I'd rewrite the  
queries to make it easier for the programmer. Start by writing your  
query as you think it *should* be written to make it easiest for  
*you* to use. Then see how it performs. If you find it doesn't  
perform well, you'll have a baseline against which you can compare  
any improvement, and you'll learn what does and does not work. The  
optimizer is pretty good—don't second guess it. Code should be easy  
for the programmer to read and use first.

Hope this helps!

Michael Glaesemann
grzm myrealbox com

--- original query

--- rewritten query
SELECT clnt.name
     , clntB.name
     , periodOf(pOHdr.orderDate)
     , srcD.recNo
     , pODet.purchaseOrdHdrRecNo
     , clnt.recNo
     , clntB.recNo
     , srcU.recNo, srcU.replacementCost, srcU.requiredQty
     , sProd.*
     , prod.*
     , VAT.*
     , pODet.orderedQty
     , mrkUp.*
FROM purchaseOrdHdr AS pOHdr
JOIN purchaseOrdDet AS pODet ON (pODet.purchaseOrdHdrRecNo =  
pOHdr.recNo)
JOIN sourceDetUpgr AS srcU ON (srcU.recNo = pODet.sourceDetUpgrRecNo)
JOIN sourceDet AS srcD ON (srcD.recNo = srcU.sourceDetRecNo)
JOIN sourceDetExtref AS srcE ON (srcE.foreignRecNo = srcD.recNo)
JOIN clientBranch AS clntB ON (clntB.recNo = srcE.ownerForeignRecNo)
JOIN client AS clnt ON (clntB.clientRecNo = clnt.recNo)
JOIN markUp AS mrkUp ON (mrkUp.foreignRecNo = clnt.recNo)
JOIN supplierProduct AS sProd ON (sProd.recNo =  
srcU.supplierProductRecNo)
JOIN product AS prod ON (prod.recNo = sProd.productRecNo)
JOIN VAT AS VAT ON (sProd.VATRecNo = VAT.recNo)
WHERE pOHdr.orderDate BETWEEN '01/12/2005' AND '21/12/2005'
     AND  1=1 -- what does this do?
     AND pODet.lineStatus != 'V'
     AND srcD.actionStatus != 'V'
     AND srcD.serviceCoBranchRecNo =  2
     AND srcU.lineStatus != 'V'
     AND srcE.tableName = 'sourcedet' -- moved down from join condition
     AND srcE.ownerForeignTableName = 'clientbranch' -- moved down  
from join condition
     AND mrkUp.foreignTableName = 'client' -- moved down from join  
condition

UNION

SELECT clnt.name
     , clntB.name
     , periodOf(srcD.enteredDate)
     , srcD.recNo
     , NULL
     , clnt.recNo
     , clntB.recNo
     , srcU.recNo, srcU.replacementCost, srcU.requiredQty
     , sProd.*
     , prod.*
     , VAT.*
     , 0
     , mrkUp.*
FROM sourceDet AS srcD
JOIN sourceDetUpgr AS srcU ON (srcU.sourceDetRecNo = srcD.recNo)
JOIN supplierProduct AS sProd ON (srcU.supplierProductRecNo =  
sProd.recNo)
JOIN product AS prod ON (sProd.productRecNo = prod.recNo)
JOIN VAT AS VAT ON (sProd.VATRecNo = VAT.recNo)
JOIN sourceDetExtref AS srcE ON (srcE.foreignRecNo = srcD.recNo)
JOIN clientBranch AS clntB ON (clntB.recNo = srcE.ownerForeignRecNo)
JOIN client AS clnt ON (clntB.clientRecNo = clnt.recNo)
JOIN markUp AS mrkUp ON (mrkUp.foreignRecNo = clnt.recNo)
WHERE srcD.enteredDate BETWEEN '01/12/2005' AND '21/12/2005'
     AND  srcD.serviceCoBranchRecNo = 2
     AND srcD.actionStatus != 'V'
     AND srcU.lineStatus = 'S'
     AND mrkUp.foreignTableName = 'client' -- from join condition
     AND srcE.tableName = 'sourcedet' -- from join condition
     AND srcE.ownerForeignTableName = 'clientbranch' -- from join  
condition
     AND (
             SELECT COUNT(*)
             FROM sourceDetUpgr AS srcU2
             WHERE srcU2.sourceDetRecNo = srcD.recNo
                 AND srcU2.lineStatus IN ('S','T')
         ) = (
             SELECT COUNT(*)
             FROM sourceDetUpgr AS srcU2
             WHERE srcU2.sourceDetRecNo = srcD.recNo
                 AND srcU2.lineStatus != 'V'
         )
AND  1=1 -- what does this do?
ORDER BY  --3,2,1,4,5
     periodof
     , clntB.name
     , clnt.name
     , srD.recNo
     , pODet.purchaseOrdHdrRecNo


--- original query
SELECT clnt.name,clntB.name,periodOf 
(pOHdr.orderDate),srcD.recNo,pODet.purchaseOrdHdrRecNo,clnt.recNo,clntB. 
recNo,srcU.recNo,srcU.replacementCost,srcU.requiredQty,sProd.*,prod.*,VA 
T.*,pODet.orderedQty,mrkUp.*
FROM purchaseOrdHdr AS pOHdr
     JOIN purchaseOrdDet AS pODet
          JOIN sourceDetUpgr AS srcU
               JOIN sourceDet AS srcD
                    JOIN sourceDetExtref AS srcE
                         JOIN clientBranch AS clntB
                              JOIN client AS clnt
                                   JOIN markUp AS mrkUp
                                   ON  mrkUp.foreignRecNo = clnt.recNo
                                   AND mrkUp.foreignTableName = 'client'
                              ON  clntB.clientRecNo   = clnt.recNo
                         ON  clntB.recNo          =  
srcE.ownerForeignRecNo
                    ON  srcE.foreignRecNo           = srcD.recNo
                    AND srcE.tableName              = 'sourcedet'
                    AND srcE.ownerForeignTableName  = 'clientbranch'
               ON  srcD.recNo = srcU.sourceDetRecNo
               JOIN supplierProduct AS sProd
                    JOIN product AS prod
                    ON prod.recNo  = sProd.productRecNo
                       JOIN VAT AS VAT
                       ON  sProd.VATRecNo          = VAT.recNo
               ON  sProd.recNo     = srcU.supplierProductRecNo
          ON  srcU.recNo       = pODet.sourceDetUpgrRecNo
     ON  pODet.purchaseOrdHdrRecNo    = pOHdr.recNo
WHERE pOHdr.orderDate BETWEEN '01/12/2005' AND '21/12/2005'
AND  1=1
AND pODet.lineStatus            != 'V'
AND srcD.actionStatus           != 'V'
AND srcD.serviceCoBranchRecNo    =  2
AND srcU.lineStatus         != 'V'
UNION
SELECT clnt.name,clntB.name,periodOf 
(srcD.enteredDate),srcD.recNo,NULL,clnt.recNo,clntB.recNo,srcU.recNo,
        srcU.replacementCost,srcU.requiredQty,sProd.*,prod.*,VAT.*, 
0,mrkUp.*
FROM sourceDet AS srcD
      JOIN sourceDetUpgr AS srcU
           JOIN supplierProduct AS sProd
                JOIN product AS prod
                ON sProd.productRecNo       = prod.recNo
                JOIN VAT AS VAT
                ON  sProd.VATRecNo          = VAT.recNo
           ON  srcU.supplierProductRecNo    = sProd.recNo
      ON  srcU.sourceDetRecNo   = srcD.recNo
      JOIN sourceDetExtref AS srcE
           JOIN clientBranch AS clntB
                JOIN client AS clnt
                     JOIN markUp AS mrkUp
                     ON  mrkUp.foreignRecNo = clnt.recNo
                     AND mrkUp.foreignTableName = 'client'
                ON  clntB.clientRecNo   = clnt.recNo
           ON  clntB.recNo              = srcE.ownerForeignRecNo
      ON  srcE.foreignRecNo           = srcD.recNo
      AND srcE.tableName              = 'sourcedet'
      AND srcE.ownerForeignTableName  = 'clientbranch'
WHERE srcD.enteredDate BETWEEN '01/12/2005' AND '21/12/2005'
AND  srcD.serviceCoBranchRecNo    = 2
AND   srcD.actionStatus            != 'V'
AND   srcU.lineStatus           = 'S'
AND   (SELECT COUNT(*)  FROM sourceDetUpgr AS srcU2
        WHERE srcU2.sourceDetRecNo = srcD.recNo
        AND   srcU2.lineStatus IN ('S','T')) = (SELECT COUNT(*) FROM  
sourceDetUpgr AS srcU2
                                                WHERE  
srcU2.sourceDetRecNo = srcD.recNo
                                                AND    
srcU2.lineStatus != 'V')
AND  1=1
ORDER BY 3,2,1,4,5


In response to

Responses

pgsql-novice by date

Next:From: Brendan DuddridgeDate: 2006-01-11 23:39:25
Subject: Re: Intel Macs and PostgreSQL
Previous:From: Scott FordDate: 2006-01-11 21:21:45
Subject: Re: Removing duplicate entries

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group