Re: BUG #3667: Job scheduling with Greenplum fails

From: "Roberts, Jon" <Jon(dot)Roberts(at)asurion(dot)com>
To: 'Andrew Sullivan' <ajs(at)crankycanuck(dot)ca>
Cc: 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3667: Job scheduling with Greenplum fails
Date: 2007-10-11 15:42:51
Message-ID: 15362F202C62EA4590F5F3E5FA15021E052866AB@nasappexc04.asurion.loc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Like I said in the email, I think the best solution is to put the code in a
function. However, after I inserted a total of 25 jobs, the difference is
more noticeable and my sql is better in terms of cost and total time.

vacuum analyze pgagent.pga_job;
vacuum analyze pgagent.pga_jobclass;
vacuum analyze pgagent.pga_jobagent;

Original SQL:
explain analyze
SELECT *,
(SELECT jlgstatus FROM pgagent.pga_joblog jl WHERE jl.jlgjobid =
j.jobid ORDER BY jlgid DESC LIMIT 1) AS joblastresult
FROM pgagent.pga_job j
JOIN pgagent.pga_jobclass cl ON cl.jclid=jobjclid
LEFT OUTER JOIN pgagent.pga_jobagent ag ON ag.jagpid=jobagentid
-- + restriction +
ORDER BY jobname
"Sort (cost=291.31..291.37 rows=25 width=127) (actual time=2.125..2.181
rows=25 loops=1)"
" Sort Key: j.jobname"
" -> Hash Join (cost=2.13..290.73 rows=25 width=127) (actual
time=0.204..1.823 rows=25 loops=1)"
" Hash Cond: (j.jobjclid = cl.jclid)"
" -> Hash Left Join (cost=1.02..4.38 rows=25 width=105) (actual
time=0.061..0.264 rows=25 loops=1)"
" Hash Cond: (j.jobagentid = ag.jagpid)"
" -> Seq Scan on pga_job j (cost=0.00..3.25 rows=25 width=63)
(actual time=0.009..0.074 rows=25 loops=1)"
" -> Hash (cost=1.01..1.01 rows=1 width=42) (actual
time=0.021..0.021 rows=1 loops=1)"
" -> Seq Scan on pga_jobagent ag (cost=0.00..1.01
rows=1 width=42) (actual time=0.005..0.008 rows=1 loops=1)"
" -> Hash (cost=1.05..1.05 rows=5 width=22) (actual
time=0.050..0.050 rows=5 loops=1)"
" -> Seq Scan on pga_jobclass cl (cost=0.00..1.05 rows=5
width=22) (actual time=0.005..0.026 rows=5 loops=1)"
" SubPlan"
" -> Limit (cost=0.00..11.40 rows=1 width=9) (actual
time=0.045..0.045 rows=0 loops=25)"
" -> Index Scan Backward using pga_joblog_pkey on pga_joblog
jl (cost=0.00..68.38 rows=6 width=9) (actual time=0.036..0.036 rows=0
loops=25)"
" Filter: (jlgjobid = $0)"
"Total runtime: 2.436 ms"

My SQL:
explain analyze
select j.*, cl.*, ag.*, sub3.jlgstatus
from pgagent.pga_job j join
pgagent.pga_jobclass cl on cl.jclid=jobjclid left outer join
pgagent.pga_jobagent ag on ag.jagpid=jobagentid
join (select j2.jlgstatus,
sub.jlgjobid
from pgagent.pga_joblog j2 join
(select jl.jlgjobid,
max(jl.jlgid) as max_jlgid
from pgagent.pga_joblog jl
group by jl.jlgjobid) sub
on sub.jlgjobid = j2.jlgjobid and
sub.max_jlgid = j2.jlgid) sub3
on sub3.jlgjobid = j.jobid
-- + restriction +
order by jobname

"Sort (cost=68.35..68.36 rows=1 width=132) (actual time=1.026..1.033 rows=2
loops=1)"
" Sort Key: j.jobname"
" -> Nested Loop (cost=36.69..68.34 rows=1 width=132) (actual
time=0.877..0.961 rows=2 loops=1)"
" -> Nested Loop Left Join (cost=36.69..67.58 rows=1 width=110)
(actual time=0.838..0.882 rows=2 loops=1)"
" -> Hash Join (cost=36.69..66.82 rows=1 width=68) (actual
time=0.810..0.830 rows=2 loops=1)"
" Hash Cond: ((j2.jlgjobid = j.jobid) AND (j2.jlgid =
sub.max_jlgid))"
" -> Seq Scan on pga_joblog j2 (cost=0.00..21.50
rows=1150 width=13) (actual time=0.024..0.130 rows=44 loops=1)"
" -> Hash (cost=36.31..36.31 rows=25 width=71) (actual
time=0.542..0.542 rows=2 loops=1)"
" -> Hash Join (cost=30.81..36.31 rows=25
width=71) (actual time=0.506..0.523 rows=2 loops=1)"
" Hash Cond: (sub.jlgjobid = j.jobid)"
" -> HashAggregate (cost=27.25..29.75
rows=200 width=8) (actual time=0.270..0.275 rows=2 loops=1)"
" -> Seq Scan on pga_joblog jl
(cost=0.00..21.50 rows=1150 width=8) (actual time=0.013..0.116 rows=44
loops=1)"
" -> Hash (cost=3.25..3.25 rows=25
width=63) (actual time=0.206..0.206 rows=25 loops=1)"
" -> Seq Scan on pga_job j
(cost=0.00..3.25 rows=25 width=63) (actual time=0.011..0.082 rows=25
loops=1)"
" -> Index Scan using pga_jobagent_pkey on pga_jobagent ag
(cost=0.00..0.75 rows=1 width=42) (actual time=0.006..0.006 rows=0 loops=2)"
" Index Cond: (ag.jagpid = j.jobagentid)"
" -> Index Scan using pga_jobclass_pkey on pga_jobclass cl
(cost=0.00..0.75 rows=1 width=22) (actual time=0.016..0.020 rows=1 loops=2)"
" Index Cond: (cl.jclid = j.jobjclid)"
"Total runtime: 1.406 ms"

I think this trend will continue as more and more jobs are inserted.

BOOYA!

Jon
-----Original Message-----
From: Andrew Sullivan [mailto:ajs(at)crankycanuck(dot)ca]
Sent: Thursday, October 11, 2007 10:14 AM
To: Roberts, Jon
Cc: 'Tom Lane'; pgsql-bugs(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #3667: Job scheduling with Greenplum fails

On Thu, Oct 11, 2007 at 07:31:44AM -0500, Roberts, Jon wrote:

> The cost is significantly lower but the total runtime is higher.

Um, so you want developers to change the thing so that it performs
more slowly, but has a prettier estimate of how much work it's going
to do? That seems like a poor optimisation to me.

A

--
Andrew Sullivan | ajs(at)crankycanuck(dot)ca
The whole tendency of modern prose is away from concreteness.
--George Orwell

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Greg Sabino Mullane 2007-10-11 22:06:45 Deferred FK / PK deletion problems
Previous Message Andrew Sullivan 2007-10-11 15:14:01 Re: BUG #3667: Job scheduling with Greenplum fails