Re: bad performances using hashjoin

From: Gaetano Mendola <mendola(at)bigfoot(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: bad performances using hashjoin
Date: 2005-02-21 00:45:03
Message-ID: 42192F0F.30404@bigfoot.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:
> Gaetano Mendola <mendola(at)bigfoot(dot)com> writes:
>
>>If you need other info in order to improve the planner,
>
>
> ... like, say, the PG version you are using, or the definitions of the
> views involved? It's difficult to say much of anything about this.

That is true, sorry I forgot it :-(
The engine is a 7.4.5 and these are the views definitions:

sat_request is just a table

CREATE OR REPLACE VIEW v_sc_packages AS
SELECT *
FROM
v_programs vpr,
v_packages vpk,
v_sequences vs
WHERE
------------ JOIN -------------
vpr.id_program = vs.id_program AND
vpk.id_package = vs.id_package AND
-------------------------------
vs.estimated_start IS NOT NULL
ORDER BY vs.estimated_start;

CREATE OR REPLACE VIEW v_programs AS
SELECT *
FROM programs
WHERE id_program<>0
ORDER BY id_publisher, id_program
;

CREATE OR REPLACE VIEW v_packages AS
SELECT *
FROM packages p LEFT OUTER JOIN package_security ps USING (id_package)
ORDER BY p.id_publisher, p.name
;

CREATE OR REPLACE VIEW v_sequences AS
SELECT id_package AS id_package,
id_program AS id_program,
internal_position AS internal_position,
estimated_start AS estimated_start
FROM sequences
ORDER BY id_program, internal_position
;

> However: the reason the second plan wins is because there are zero rows
> fetched from sat_request, and so the bulk of the plan is never executed
> at all. I doubt the second plan would win if there were any matching
> sat_request rows. If this is the case you actually need to optimize,
> probably the thing to do is to get rid of the ORDER BY clauses you
> evidently have in your views, so that there's some chance of building
> a fast-start plan.

Removed all order by from that views, this is the comparison between the two
(orderdered and not ordered):

empdb=# explain analyze SELECT id_sat_request
empdb-# FROM sat_request sr,
empdb-# v_sc_packages vs
empdb-# WHERE ----- JOIN ----
empdb-# sr.id_package = vs.id_package AND
empdb-# ---------------
empdb-# id_user = 29416 AND
empdb-# id_url = 424364 AND
empdb-# vs.estimated_start > now() AND
empdb-# id_sat_request_status = sp_lookup_id('sat_request_status', 'Scheduled');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=280.98..284.74 rows=1 width=4) (actual time=895.344..895.344 rows=0 loops=1)
Hash Cond: ("outer".id_package = "inner".id_package)
-> Subquery Scan vs (cost=277.94..280.19 rows=301 width=4) (actual time=893.191..894.396 rows=569 loops=1)
-> Sort (cost=277.94..278.69 rows=301 width=263) (actual time=893.184..893.546 rows=569 loops=1)
Sort Key: vs.estimated_start
-> Hash Join (cost=232.61..265.54 rows=301 width=263) (actual time=868.980..889.643 rows=569 loops=1)
Hash Cond: ("outer".id_package = "inner".id_package)
-> Subquery Scan vpk (cost=150.29..159.26 rows=1196 width=218) (actual time=822.281..834.063 rows=1203 loops=1)
-> Sort (cost=150.29..153.28 rows=1196 width=159) (actual time=822.266..823.190 rows=1203 loops=1)
Sort Key: p.id_publisher, p.name
-> Hash Left Join (cost=16.14..89.16 rows=1196 width=159) (actual time=3.504..809.262 rows=1203 loops=1)
Hash Cond: ("outer".id_package = "inner".id_package)
-> Seq Scan on packages p (cost=0.00..53.98 rows=1196 width=143) (actual time=0.018..13.869 rows=1203 loops=1)
-> Hash (cost=14.09..14.09 rows=818 width=20) (actual time=2.395..2.395 rows=0 loops=1)
-> Seq Scan on package_security ps (cost=0.00..14.09 rows=818 width=20) (actual time=0.020..1.520 rows=845 loops=1)
-> Hash (cost=82.19..82.19 rows=51 width=49) (actual time=46.402..46.402 rows=0 loops=1)
-> Merge Join (cost=79.54..82.19 rows=51 width=49) (actual time=39.435..45.376 rows=569 loops=1)
Merge Cond: ("outer".id_program = "inner".id_program)
-> Subquery Scan vs (cost=70.98..72.59 rows=214 width=16) (actual time=16.834..19.102 rows=569 loops=1)
-> Sort (cost=70.98..71.52 rows=214 width=20) (actual time=16.824..17.338 rows=569 loops=1)
Sort Key: sequences.id_program, sequences.internal_position
-> Seq Scan on sequences (cost=0.00..62.70 rows=214 width=20) (actual time=0.638..11.969 rows=569 loops=1)
Filter: ((estimated_start IS NOT NULL) AND (date_trunc('seconds'::text, estimated_start) > now()))
-> Sort (cost=8.56..8.68 rows=47 width=37) (actual time=22.580..23.123 rows=605 loops=1)
Sort Key: vpr.id_program
-> Subquery Scan vpr (cost=6.90..7.25 rows=47 width=37) (actual time=22.294..22.464 rows=48 loops=1)
-> Sort (cost=6.90..7.02 rows=47 width=61) (actual time=22.287..22.332 rows=48 loops=1)
Sort Key: programs.id_publisher, programs.id_program
-> Seq Scan on programs (cost=0.00..5.60 rows=47 width=61) (actual time=4.356..22.068 rows=48 loops=1)
Filter: (id_program <> 0)
-> Hash (cost=3.04..3.04 rows=1 width=8) (actual time=0.033..0.033 rows=0 loops=1)
-> Index Scan using idx_id_url_sat_request on sat_request sr (cost=0.00..3.04 rows=1 width=8) (actual time=0.031..0.031 rows=0 loops=1)
Index Cond: (id_url = 424364)
Filter: ((id_user = 29416) AND (id_sat_request_status = 1))
Total runtime: 897.044 ms
(35 rows)

empdb=# explain analyze SELECT id_sat_request
empdb-# FROM sat_request sr,
empdb-# v_sc_packages_new vs
empdb-# WHERE ----- JOIN ----
empdb-# sr.id_package = vs.id_package AND
empdb-# ---------------
empdb-# id_user = 29416 AND
empdb-# id_url = 424364 AND
empdb-# vs.estimated_start > now() AND
empdb-# id_sat_request_status = sp_lookup_id('sat_request_status', 'Scheduled');
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=19.18..96.87 rows=1 width=4) (actual time=15.576..15.576 rows=0 loops=1)
-> Nested Loop (cost=19.18..93.04 rows=1 width=8) (actual time=15.569..15.569 rows=0 loops=1)
-> Hash Join (cost=19.18..89.21 rows=1 width=12) (actual time=15.566..15.566 rows=0 loops=1)
Hash Cond: ("outer".id_package = "inner".id_package)
-> Hash Left Join (cost=16.14..80.19 rows=1196 width=4) (actual time=7.291..13.620 rows=1203 loops=1)
Hash Cond: ("outer".id_package = "inner".id_package)
-> Seq Scan on packages p (cost=0.00..53.98 rows=1196 width=4) (actual time=0.028..2.694 rows=1203 loops=1)
-> Hash (cost=14.09..14.09 rows=818 width=4) (actual time=6.707..6.707 rows=0 loops=1)
-> Seq Scan on package_security ps (cost=0.00..14.09 rows=818 width=4) (actual time=0.026..4.620 rows=845 loops=1)
-> Hash (cost=3.04..3.04 rows=1 width=8) (actual time=0.061..0.061 rows=0 loops=1)
-> Index Scan using idx_id_url_sat_request on sat_request sr (cost=0.00..3.04 rows=1 width=8) (actual time=0.056..0.056 rows=0 loops=1)
Index Cond: (id_url = 424364)
Filter: ((id_user = 29416) AND (id_sat_request_status = 1))
-> Index Scan using idx_sequences_id_package on sequences (cost=0.00..3.82 rows=1 width=8) (never executed)
Index Cond: ("outer".id_package = sequences.id_package)
Filter: ((estimated_start IS NOT NULL) AND (date_trunc('seconds'::text, estimated_start) > now()))
-> Index Scan using programs_pkey on programs (cost=0.00..3.83 rows=1 width=4) (never executed)
Index Cond: (programs.id_program = "outer".id_program)
Filter: (id_program <> 0)
Total runtime: 16.279 ms
(20 rows)

The second one of course is faster, this is the second select with hashjoin disabled:

empdb=# set enable_hashjoin = false;
SET
empdb=# explain analyze SELECT id_sat_request
empdb-# FROM sat_request sr,
empdb-# v_sc_packages_new vs
empdb-# WHERE ----- JOIN ----
empdb-# sr.id_package = vs.id_package AND
empdb-# ---------------
empdb-# id_user = 29416 AND
empdb-# id_url = 424364 AND
empdb-# vs.estimated_start > now() AND
empdb-# id_sat_request_status = sp_lookup_id('sat_request_status', 'Scheduled');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=10.62..175.83 rows=1 width=4) (actual time=0.280..0.280 rows=0 loops=1)
Merge Cond: ("outer".id_package = "inner".id_package)
-> Merge Left Join (cost=0.00..162.21 rows=1196 width=4) (actual time=0.188..0.188 rows=1 loops=1)
Merge Cond: ("outer".id_package = "inner".id_package)
-> Index Scan using packages_pkey on packages p (cost=0.00..115.51 rows=1196 width=4) (actual time=0.085..0.085 rows=1 loops=1)
-> Index Scan using package_security_id_package_key on package_security ps (cost=0.00..39.06 rows=818 width=4) (actual time=0.080..0.080 rows=1 loops=1)
-> Sort (cost=10.62..10.62 rows=1 width=12) (actual time=0.087..0.087 rows=0 loops=1)
Sort Key: sr.id_package
-> Nested Loop (cost=0.00..10.61 rows=1 width=12) (actual time=0.069..0.069 rows=0 loops=1)
-> Nested Loop (cost=0.00..6.77 rows=1 width=16) (actual time=0.067..0.067 rows=0 loops=1)
-> Index Scan using idx_id_url_sat_request on sat_request sr (cost=0.00..3.04 rows=1 width=8) (actual time=0.065..0.065 rows=0 loops=1)
Index Cond: (id_url = 424364)
Filter: ((id_user = 29416) AND (id_sat_request_status = 1))
-> Index Scan using idx_sequences_id_package on sequences (cost=0.00..3.72 rows=1 width=8) (never executed)
Index Cond: ("outer".id_package = sequences.id_package)
Filter: ((estimated_start IS NOT NULL) AND (date_trunc('seconds'::text, estimated_start) > now()))
-> Index Scan using programs_pkey on programs (cost=0.00..3.83 rows=1 width=4) (never executed)
Index Cond: (programs.id_program = "outer".id_program)
Filter: (id_program <> 0)
Total runtime: 0.604 ms
(20 rows)

I see the problem is still here:
Hash Left Join (cost=16.14..80.19 rows=1196 width=4) (actual time=7.291..13.620 rows=1203 loops=1)

BTW, at this time the executions time seen are lower because right now is not a peak hour

> BTW, I believe in 8.0 the first plan would be about as fast as the
> second, because we added some code to hash join to fall out without
> scanning the left input if the right input is empty.

I'll take it a try if you are really interested in the results.

Regards
Gaetano Mendola

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Gaetano Mendola 2005-02-21 00:50:18 Re: Effects of IDLE processes
Previous Message Mark Kirkwood 2005-02-20 22:58:43 Re: Problem with 7.4.5 and webmin 1.8 in grant function