Inefficient queryplan for query with intersectable subselects/joins

From: Arjen van der Meijden <acmmailing(at)tweakers(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Inefficient queryplan for query with intersectable subselects/joins
Date: 2005-08-26 11:08:52
Message-ID: 430EF844.70905@tweakers.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi list,

I'm writing an application that will aggregate records with a few
million records into averages/sums/minimums etc grouped per day.

Clients can add filters and do lots of customization on what they want
to see. And I've to translate that to one or more queries. Basically, I
append each filter as either an extra and-in-the-where or joined with
the clauses as ON-clause. The application now uses 8.1devel but I think
the basic plans are similar to 8.0. At least for this first query.

I noticed a query taking over 25 seconds to execute:

SELECT "make a timestamp" grouper, chart_2.Prijs as field_2_0
FROM
pwprijs as chart_2
JOIN pwprodukten t_0 ON chart_2.ProduktID = t_0.ID AND t_0.Cat2 IN
(SELECT 545 UNION SELECT ID FROM cat WHERE ParentID = 545)
JOIN pwprijs t_1 ON chart_2.ProduktID = t_1.ProduktID
AND t_1.LeverancierID = 938 AND t_1.recordtimestamp >= "last
timestamp"
WHERE
chart_2.Prijs > 0

It yields quite a long plan, so I've send that as an attachment along.
Basically it combines two tables against an original to fetch "all
prices (of all suppliers) for products of a certain category that are
sold by a certain supplier".

I was wondering how rewriting it to subselects would improve
performance, but that wasn't a very clear winner. It shaved of about 5
seconds. So I took the subselects and used INTERSECT to unite them and
have only one IN-clause in the query. That made it go down to around 13
seconds.

I noticed it was doing a seq scan on the largest table to get the "Prijs
> 0"-condition. But since there are only 947 of the 7692207 with prijs
= 0 and none with < 0, it shouldn't be the main thing to look for.
Dropping the clause made a minor improvement in performance for the queries.

But disabling sequential scans allowed an improvement to only 660 ms
compared to the 13 seconds earlier! Row-estimates seem to be quite a bit
off, so I already set the statistics target to 1000 and re-analyzed.
Btw, adding the prijs-clause again would make it choose another index
and thus resulted in much longer operation.

The final query, only taking 650ms, would be:

SELECT
"make a timestamp" as grouper,
chart_2.Prijs as field_2_0
FROM
pwprijs as chart_2
WHERE
chart_2.ProduktID IN (SELECT ID FROM pwprodukten WHERE Cat2 IN
(SELECT 545 UNION SELECT ID FROM cat WHERE ParentID = 545)
INTERSECT
SELECT produktid FROM pwprijs WHERE LeverancierID = 938
AND recordtimestamp >= "last timestamp")

So I'm wondering: how can I make postgres decide to use the (correct)
index without having to disable seq scans and how can I still add the
prijs-clause without dropping the index for it (since it should be used
for other filters). And for ease of use in my application I'd prefer to
use the first query or the version with two seperate IN-clauses.

Is that possible?

I left all the configuration-stuff to the defaults since changing values
didn't seem to impact much. Apart from the buffers and effective cache,
increasing those made the performance worse.

Best regards,

Arjen

Attachment Content-Type Size
query-plans.txt text/plain 21.6 KB

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Richard Huxton 2005-08-26 13:05:23 Re: Inefficient queryplan for query with intersectable
Previous Message Richard Huxton 2005-08-26 08:34:31 Re: postmaster memory keep going up????