Skip site navigation (1) Skip section navigation (2)

Inefficient queryplan for query with intersectable subselects/joins

From: Arjen van der Meijden <acmmailing(at)tweakers(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Inefficient queryplan for query with intersectable subselects/joins
Date: 2005-08-26 11:08:52
Message-ID: 430EF844.70905@tweakers.net (view raw or flat)
Thread:
Lists: pgsql-performance
Hi list,

I'm writing an application that will aggregate records with a few 
million records into averages/sums/minimums etc grouped per day.

Clients can add filters and do lots of customization on what they want 
to see. And I've to translate that to one or more queries. Basically, I 
append each filter as either an extra and-in-the-where or joined with 
the clauses as ON-clause. The application now uses 8.1devel but I think 
the basic plans are similar to 8.0. At least for this first query.

I noticed a query taking over 25 seconds to execute:

SELECT "make a timestamp" grouper, chart_2.Prijs as field_2_0
FROM
   pwprijs as chart_2
   JOIN pwprodukten t_0 ON chart_2.ProduktID = t_0.ID AND t_0.Cat2 IN
             (SELECT 545 UNION SELECT ID FROM cat WHERE ParentID = 545)
   JOIN pwprijs t_1 ON chart_2.ProduktID = t_1.ProduktID
       AND t_1.LeverancierID = 938 AND t_1.recordtimestamp >= "last 
timestamp"
WHERE
   chart_2.Prijs > 0

It yields quite a long plan, so I've send that as an attachment along.
Basically it combines two tables against an original to fetch "all 
prices (of all suppliers) for products of a certain category that are 
sold by a certain supplier".

I was wondering how rewriting it to subselects would improve 
performance, but that wasn't a very clear winner. It shaved of about 5 
seconds. So I took the subselects and used INTERSECT to unite them and 
have only one IN-clause in the query. That made it go down to around 13 
seconds.

I noticed it was doing a seq scan on the largest table to get the "Prijs 
 > 0"-condition. But since there are only 947 of the 7692207 with prijs 
= 0 and none with < 0, it shouldn't be the main thing to look for.
Dropping the clause made a minor improvement in performance for the queries.

But disabling sequential scans allowed an improvement to only 660 ms 
compared to the 13 seconds earlier! Row-estimates seem to be quite a bit 
off, so I already set the statistics target to 1000 and re-analyzed.
Btw, adding the prijs-clause again would make it choose another index 
and thus resulted in much longer operation.

The final query, only taking 650ms, would be:

SELECT
   "make a timestamp" as grouper,
   chart_2.Prijs as field_2_0
FROM
   pwprijs as chart_2
WHERE
   chart_2.ProduktID IN (SELECT ID FROM pwprodukten WHERE Cat2 IN 
(SELECT 545 UNION SELECT ID FROM cat WHERE ParentID = 545)
                           INTERSECT
                 SELECT produktid FROM pwprijs WHERE LeverancierID = 938 
AND recordtimestamp >= "last timestamp")

So I'm wondering: how can I make postgres decide to use the (correct) 
index without having to disable seq scans and how can I still add the 
prijs-clause without dropping the index for it (since it should be used 
for other filters). And for ease of use in my application I'd prefer to 
use the first query or the version with two seperate IN-clauses.

Is that possible?

I left all the configuration-stuff to the defaults since changing values 
didn't seem to impact much. Apart from the buffers and effective cache, 
increasing those made the performance worse.

Best regards,

Arjen

Attachment: query-plans.txt
Description: text/plain (21.6 KB)

Responses

pgsql-performance by date

Next:From: Richard HuxtonDate: 2005-08-26 13:05:23
Subject: Re: Inefficient queryplan for query with intersectable
Previous:From: Richard HuxtonDate: 2005-08-26 08:34:31
Subject: Re: postmaster memory keep going up????

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group