Skip site navigation (1) Skip section navigation (2)

Re: Forcing filter/join order?

From: "Peter Darley" <pdarley(at)kinesis-cem(dot)com>
To: <josh(at)agliodbs(dot)com>,"pgsql-performance" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Forcing filter/join order?
Date: 2004-02-19 00:40:33
Message-ID: PDEOIIFFBIAABMGNJAGPIENFCNAA.pdarley@kinesis-cem.com (view raw or flat)
Thread:
Lists: pgsql-performance
Josh,
	I'm sure the big brains have a better suggestion, but in the mean time
could you do something as simple as:

SELECT *
FROM (select * from events where event_date BETWEEN 'date1' AND 'date2') e
LEFT OUTER JOIN cases ON e.case_id = cases.case_id;

Thanks,
Peter Darley

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-owner(at)postgresql(dot)org]On Behalf Of Josh Berkus
Sent: Wednesday, February 18, 2004 4:11 PM
To: pgsql-performance
Subject: [PERFORM] Forcing filter/join order?


Folks,

Have an interesting issue with a complex query, where apparently I need to
twist the query planner's arm, and am looking for advice on how to do so.

The situation:  I have a table, events, with about 300,000 records.
It does an outer join to a second table, cases, with about 150,000 records.

A very simplified version query would be like

SELECT *
FROM events LEFT OUTER JOIN cases ON events.case_id = cases.case_id
WHERE events.event_date BETWEEN 'date1' AND 'date2'

This join is very expensive, as you can imagine.   Yet I can't seem to force
the query planner to apply the filter conditions to the events table
*before*
attempting to join it to cases.  Here's the crucial explain lines:

                                 ->  Merge Left Join  (cost=0.00..11880.82
rows=15879 width=213) (actual time=5.777..901.899 rows=648 loops=1)
                                       Merge Cond: ("outer".case_id =
"inner".case_id)
                                       Join Filter:
(("outer".link_type)::text
= 'case'::text)
                                       ->  Index Scan using idx_event_ends
on
events  (cost=0.00..4546.15 rows=15879 width=80
) (actual time=4.144..333.769 rows=648 loops=1)
                                             Filter: ((status <> 0) AND
((event_date + duration) >= '2004-02-18 00:00:00'::timestamp without time
zone) AND (event_date <= '2004-03-05 23:59:00'::timestamp without time
zone))
                                       ->  Index Scan using cases_pkey on
cases  (cost=0.00..6802.78 rows=117478 width=137) (
actual time=0.139..402.363 rows=116835 loops=1)

As you can see, part of the problem is a pretty drastic (20x) mis-estimation
of the selectivity of the date limits on events -- and results in 90% of the
execution time of my query on this one join.  I've tried raising the
statistics on event_date, duration, and case_id (to 1000), but this doesn't
seem to affect the estimate or the query plan.

In the above test, idx_event_ends indexes (case_id, status, event_date,
(event_date + duration)), but as you can see the planner uses only the first
column.  This was an attempt to circumvent the planner's tendency to
completely ignoring any index on (event_date, (event_date + duration))  --
even though that index is the most selective combination on the events
table.

Is there anything I can do to force the query planner to filter on events
before joining cases, other than waiting for version 7.5?

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings


In response to

Responses

pgsql-performance by date

Next:From: Stephan SzaboDate: 2004-02-19 00:56:22
Subject: Re: Forcing filter/join order?
Previous:From: Josh BerkusDate: 2004-02-19 00:31:54
Subject: Re: Forcing filter/join order?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group