Re: complex query performance assistance request

From: John Mendenhall <john(at)surfutopia(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: complex query performance assistance request
Date: 2005-08-23 19:05:25
Message-ID: 20050823190525.GA27623@calvin.surfutopia.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom,

> > Would it be best to attempt to rewrite it for IN?
> > Or, should we try to tie it in with a join?
>
> Couldn't say without a deeper understanding of what you're trying to
> accomplish.

Here are the results of each SQL rewrite.

The first pass, I rewrote it as c.id IN ():
-----
LOG: duration: 2669.682 ms statement: explain analyze
SELECT
c.id AS contact_id,
sr.id AS sales_rep_id,
p.id AS partner_id,
coalesce(LTRIM(RTRIM(c.company)), LTRIM(RTRIM(c.firstname || ' ' || c.lastname))) AS contact_company,
co.name AS contact_country,
c.master_key_token
FROM
sales_reps sr
JOIN partners p ON (sr.id = p.sales_rep_id)
JOIN contacts c ON (p.id = c.partner_id)
JOIN countries co ON (LOWER(c.country) = LOWER(co.code))
JOIN partner_classification pc ON (p.classification_id = pc.id AND pc.classification != 'Sales Rep')
WHERE
c.lead_deleted IS NULL
AND c.id IN
(
SELECT
lr.contact_id
FROM
lead_requests lr,
lead_request_status lrs
WHERE
lr.status_id = lrs.id AND
lrs.is_closed = 0
)
ORDER BY
contact_company, contact_id
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Sort (cost=4413.35..4416.16 rows=1123 width=102) (actual time=2617.069..2617.719 rows=1071 loops=1)
Sort Key: COALESCE(ltrim(rtrim((c.company)::text)), ltrim(rtrim((((c.firstname)::text || ' '::text) || (c.lastname)::text)))), c.id
-> Merge Join (cost=4311.31..4356.45 rows=1123 width=102) (actual time=2549.717..2589.398 rows=1071 loops=1)
Merge Cond: ("outer"."?column3?" = "inner"."?column9?")
-> Sort (cost=14.00..14.61 rows=242 width=19) (actual time=9.765..9.966 rows=240 loops=1)
Sort Key: lower((co.code)::text)
-> Seq Scan on countries co (cost=0.00..4.42 rows=242 width=19) (actual time=0.142..5.118 rows=242 loops=1)
-> Sort (cost=4297.31..4299.63 rows=928 width=95) (actual time=2539.685..2540.913 rows=1071 loops=1)
Sort Key: lower((c.country)::text)
-> Merge IN Join (cost=4163.02..4251.57 rows=928 width=95) (actual time=2377.539..2524.844 rows=1071 loops=1)
Merge Cond: ("outer".id = "inner".contact_id)
-> Sort (cost=1835.53..1851.27 rows=6296 width=95) (actual time=1843.866..1853.193 rows=6349 loops=1)
Sort Key: c.id
-> Merge Join (cost=75.65..1438.24 rows=6296 width=95) (actual time=51.713..1505.633 rows=6349 loops=1)
Merge Cond: ("outer".partner_id = "inner".id)
-> Index Scan using contacts_partner_id_idx on contacts c (cost=0.00..5303.84 rows=40243 width=85) (actual time=0.077..584.736 rows=40267 loops=1)
Filter: (lead_deleted IS NULL)
-> Sort (cost=75.65..76.37 rows=290 width=20) (actual time=51.508..62.288 rows=6462 loops=1)
Sort Key: p.id
-> Merge Join (cost=59.24..63.79 rows=290 width=20) (actual time=30.152..38.281 rows=395 loops=1)
Merge Cond: ("outer".id = "inner".sales_rep_id)
-> Sort (cost=2.42..2.52 rows=39 width=10) (actual time=1.390..1.505 rows=39 loops=1)
Sort Key: sr.id
-> Seq Scan on sales_reps sr (cost=0.00..1.39 rows=39 width=10) (actual time=0.026..0.380 rows=39 loops=1)
-> Sort (cost=56.82..57.55 rows=290 width=20) (actual time=28.558..29.120 rows=395 loops=1)
Sort Key: p.sales_rep_id
-> Nested Loop (cost=24.35..44.96 rows=290 width=20) (actual time=0.191..21.408 rows=395 loops=1)
Join Filter: ("inner".classification_id = "outer".id)
-> Seq Scan on partner_classification pc (cost=0.00..1.04 rows=2 width=10) (actual time=0.068..0.121 rows=2 loops=1)
Filter: ((classification)::text <> 'Sales Rep'::text)
-> Materialize (cost=24.35..28.70 rows=435 width=30) (actual time=0.029..5.380 rows=435 loops=2)
-> Seq Scan on partners p (cost=0.00..24.35 rows=435 width=30) (actual time=0.038..8.161 rows=435 loops=1)
-> Sort (cost=2327.50..2351.43 rows=9573 width=11) (actual time=533.508..535.629 rows=1742 loops=1)
Sort Key: lr.contact_id
-> Merge Join (cost=1520.94..1694.49 rows=9573 width=11) (actual time=302.932..461.644 rows=1745 loops=1)
Merge Cond: ("outer".id = "inner".status_id)
-> Sort (cost=1.28..1.30 rows=8 width=10) (actual time=0.392..0.404 rows=7 loops=1)
Sort Key: lrs.id
-> Seq Scan on lead_request_status lrs (cost=0.00..1.16 rows=8 width=10) (actual time=0.117..0.280 rows=7 loops=1)
Filter: (is_closed = 0::numeric)
-> Sort (cost=1519.66..1558.55 rows=15556 width=21) (actual time=302.423..321.939 rows=15387 loops=1)
Sort Key: lr.status_id
-> Seq Scan on lead_requests lr (cost=0.00..436.56 rows=15556 width=21) (actual time=0.029..164.708 rows=15559 loops=1)
Total runtime: 2632.987 ms
(44 rows)
-----

The second pass, I rewrote it to tie in with a JOIN, adding
a DISTINCT at the top to get rid of the duplicates:
-----
LOG: duration: 3285.645 ms statement: explain analyze
SELECT DISTINCT
c.id AS contact_id,
sr.id AS sales_rep_id,
p.id AS partner_id,
coalesce(LTRIM(RTRIM(c.company)), LTRIM(RTRIM(c.firstname || ' ' || c.lastname))) AS contact_company,
co.name AS contact_country,
c.master_key_token
FROM
sales_reps sr
JOIN partners p ON (sr.id = p.sales_rep_id)
JOIN contacts c ON (p.id = c.partner_id)
JOIN countries co ON (LOWER(c.country) = LOWER(co.code))
JOIN partner_classification pc ON (p.classification_id = pc.id AND pc.classification != 'Sales Rep')
JOIN lead_requests lr ON (c.id = lr.contact_id)
JOIN lead_request_status lrs ON (lr.status_id = lrs.id AND lrs.is_closed = 0)
WHERE
c.lead_deleted IS NULL
ORDER BY
contact_company, contact_id
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Unique (cost=3039.78..3071.46 rows=1810 width=102) (actual time=3219.707..3228.637 rows=1071 loops=1)
-> Sort (cost=3039.78..3044.31 rows=1810 width=102) (actual time=3219.695..3220.560 rows=1118 loops=1)
Sort Key: COALESCE(ltrim(rtrim((c.company)::text)), ltrim(rtrim((((c.firstname)::text || ' '::text) || (c.lastname)::text)))), c.id, sr.id, p.id, co.name, c.master_key_token
-> Merge Join (cost=2870.92..2941.85 rows=1810 width=102) (actual time=3156.788..3188.338 rows=1118 loops=1)
Merge Cond: ("outer"."?column3?" = "inner"."?column9?")
-> Sort (cost=14.00..14.61 rows=242 width=19) (actual time=9.196..9.445 rows=240 loops=1)
Sort Key: lower((co.code)::text)
-> Seq Scan on countries co (cost=0.00..4.42 rows=242 width=19) (actual time=0.128..3.914 rows=242 loops=1)
-> Sort (cost=2856.92..2860.66 rows=1496 width=95) (actual time=3147.340..3148.477 rows=1118 loops=1)
Sort Key: lower((c.country)::text)
-> Merge Join (cost=2750.88..2778.03 rows=1496 width=95) (actual time=3008.933..3132.122 rows=1118 loops=1)
Merge Cond: ("outer".id = "inner".status_id)
-> Sort (cost=1.28..1.30 rows=8 width=10) (actual time=0.366..0.379 rows=7 loops=1)
Sort Key: lrs.id
-> Seq Scan on lead_request_status lrs (cost=0.00..1.16 rows=8 width=10) (actual time=0.094..0.254 rows=7 loops=1)
Filter: (is_closed = 0::numeric)
-> Sort (cost=2749.60..2755.67 rows=2430 width=105) (actual time=3008.396..3023.502 rows=9992 loops=1)
Sort Key: lr.status_id
-> Merge Join (cost=1835.53..2612.95 rows=2430 width=105) (actual time=1975.714..2912.632 rows=10089 loops=1)
Merge Cond: ("outer".contact_id = "inner".id)
-> Index Scan using lead_requests_contact_id_idx on lead_requests lr (cost=0.00..683.87 rows=15556 width=21) (actual time=0.073..247.148 rows=15556 loops=1)
-> Sort (cost=1835.53..1851.27 rows=6296 width=95) (actual time=1975.273..1988.664 rows=10089 loops=1)
Sort Key: c.id
-> Merge Join (cost=75.65..1438.24 rows=6296 width=95) (actual time=56.107..1625.186 rows=6349 loops=1)
Merge Cond: ("outer".partner_id = "inner".id)
-> Index Scan using contacts_partner_id_idx on contacts c (cost=0.00..5303.84 rows=40243 width=85) (actual time=0.047..580.311 rows=40267 loops=1)
Filter: (lead_deleted IS NULL)
-> Sort (cost=75.65..76.37 rows=290 width=20) (actual time=55.935..65.502 rows=6462 loops=1)
Sort Key: p.id
-> Merge Join (cost=59.24..63.79 rows=290 width=20) (actual time=31.765..39.925 rows=395 loops=1)
Merge Cond: ("outer".id = "inner".sales_rep_id)
-> Sort (cost=2.42..2.52 rows=39 width=10) (actual time=1.072..1.117 rows=39 loops=1)
Sort Key: sr.id
-> Seq Scan on sales_reps sr (cost=0.00..1.39 rows=39 width=10) (actual time=0.022..0.312 rows=39 loops=1)
-> Sort (cost=56.82..57.55 rows=290 width=20) (actual time=30.489..30.893 rows=395 loops=1)
Sort Key: p.sales_rep_id
-> Nested Loop (cost=24.35..44.96 rows=290 width=20) (actual time=0.159..23.356 rows=395 loops=1)
Join Filter: ("inner".classification_id = "outer".id)
-> Seq Scan on partner_classification pc (cost=0.00..1.04 rows=2 width=10) (actual time=0.047..0.086 rows=2 loops=1)
Filter: ((classification)::text <> 'Sales Rep'::text)
-> Materialize (cost=24.35..28.70 rows=435 width=30) (actual time=0.028..6.124 rows=435 loops=2)
-> Seq Scan on partners p (cost=0.00..24.35 rows=435 width=30) (actual time=0.039..9.383 rows=435 loops=1)
Total runtime: 3241.139 ms
(43 rows)
-----

The DISTINCT ON condition was about the same amount of time,
statistically. Removing the DISTINCT entirely only gave a
very slight improvement in performance.

So, the bottom line is, unless there are other ideas to
improve the performance, I will most likely rewrite our
application to use the c.id IN () option.

Thank you very much for your input and suggestions.

JohnM

--
John Mendenhall
john(at)surfutopia(dot)net
surf utopia
internet services

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2005-08-23 19:23:43 Re: Caching by Postgres
Previous Message Donald Courtney 2005-08-23 18:41:39 Re: Caching by Postgres