Trying to eliminate union and sort

From: Brian Fehrle <brianf(at)consistentstate(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: "Kempter, Kevin" <kevink(at)consistentstate(dot)com>
Subject: Trying to eliminate union and sort
Date: 2013-07-11 18:27:48
Message-ID: 51DEF924.1040809@consistentstate.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi All,

(basic info)
PostgreSQL 9.2.4
64 bit Linux host
4GB shared_buffers with 14GB system memory, dedicated database VM
10MB work_mem

I have a query that takes over 6 minutes to complete, and it's due
mainly to the two sorting operations being done on this query. The data
it is returning itself is quite large, and with the two sorting
operations it does (one because of the union, one because of the order
by), it ends up using about *2.1GB* of temporary file space (and there
is no way I can increase work_mem to hold this).

--[Query]----------------------------
SELECT
t.id,
t.mycolumn1,
(SELECT table3.otherid FROM table3 WHERE table3.id = t.typeid),
(SELECT table3.otherid FROM table3 WHERE table3.id = t2.third_id),
t.mycolumn2 AS mycolumn2

FROM table1 t

LEFT OUTER JOIN table2 t2 ON t2.real_id = t.id

WHERE t.external_id IN ('6544', '2234', '2', '4536')

UNION

SELECT
t.id,
t.mycolumn1,
(SELECT table3.otherid FROM table3 WHERE table3.id = t.typeid),
(SELECT table3.otherid FROM table3 WHERE table3.id = t2.third_id),
t.mycolumn2 AS mycolumn2

FROM table1 t

LEFT OUTER JOIN table2 t2 ON t2.real_id = t.backup_id

WHERE t.external_id IN ('6544', '2234', '2', '4536')

ORDER BY t.mycolumn2, t.id;

--[Explain Analyze (sorry for the anonymizing)]----------------------------
Sort (cost=133824460.450..133843965.150 rows=7801882 width=56) (actual
time=849405.656..894724.453 rows=9955729 loops=1)
Sort Key: romeo.three, romeo.quebec_seven
*Sort Method: external merge Disk: 942856kB*
-> Unique (cost=132585723.530..132702751.760 rows=7801882 width=56)
(actual time=535267.196..668982.694 rows=9955729 loops=1)
-> Sort (cost=132585723.530..132605228.240 rows=7801882
width=56) (actual time=535267.194..649304.631 rows=10792011 loops=1)
Sort Key: romeo.quebec_seven, romeo.golf, ((delta_four
3)), ((delta_four 4)), romeo.three
*Sort Method: external merge Disk: 1008216kB*
-> Append (cost=0.000..131464014.850 rows=7801882
width=56) (actual time=0.798..291477.595 rows=10792011 loops=1)
-> Merge Left Join (cost=0.000..46340412.140
rows=2748445 width=56) (actual time=0.797..70748.213 rows=3690431 loops=1)
Merge Cond: (romeo.quebec_seven =
alpha_bravo.six_kilo)
-> Index Scan using juliet on zulu romeo
(cost=0.000..163561.710 rows=2748445 width=52) (actual
time=0.019..11056.883 rows=3653472 loops=1)
Filter: (delta_uniform = ANY
('two'::integer[]))
-> Index Only Scan using sierra on
quebec_juliet alpha_bravo (cost=0.000..86001.860 rows=2191314 width=8)
(actual time=0.047..3996.543 rows=2191314 loops=1)
Heap Fetches: 2191314
SubPlan
-> Index Scan using oscar on six_delta
(cost=0.000..8.380 rows=1 width=23) (actual time=0.009..0.009 rows=1
loops=3690431)
Index Cond: (quebec_seven = romeo.lima)
SubPlan
-> Index Scan using oscar on six_delta
(cost=0.000..8.380 rows=1 width=23) (actual time=0.001..0.001 rows=0
loops=3690431)
Index Cond: (quebec_seven =
alpha_bravo.delta_november)
-> Merge Right Join (cost=0.000..85045583.890
rows=5053437 width=56) (actual time=0.843..213450.477 rows=7101580 loops=1)
Merge Cond: (alpha_bravo.six_kilo =
romeo.india)
-> Index Only Scan using sierra on
quebec_juliet alpha_bravo (cost=0.000..86001.860 rows=2191314 width=8)
(actual time=0.666..6165.870 rows=2191314 loops=1)
Heap Fetches: 2191314
-> Materialize (cost=0.000..193106.580
rows=2748445 width=56) (actual time=0.134..25852.353 rows=7101580 loops=1)
-> Index Scan using alpha_seven on
zulu romeo (cost=0.000..186235.470 rows=2748445 width=56) (actual
time=0.108..18439.857 rows=3653472 loops=1)
Filter: (delta_uniform = ANY
('two'::integer[]))
SubPlan
-> Index Scan using oscar on six_delta
(cost=0.000..8.380 rows=1 width=23) (actual time=0.009..0.010 rows=1
loops=7101580)
Index Cond: (quebec_seven = romeo.lima)
SubPlan
-> Index Scan using oscar on six_delta
(cost=0.000..8.380 rows=1 width=23) (actual time=0.007..0.008 rows=1
loops=7101580)
Index Cond: (quebec_seven =
alpha_bravo.delta_november)
--[end]----------------------------

*My attempts:*
1. I tried to get an index set up on table1 that orders t.mycolumn2 and
t.id so that the sorting operation might be skipped, however when table1
is accessed it is using an index that's on t.id due to the left join to
table2, then filters out the "WHERE t.external_id IN ('6544', '2234',
'2', '4536')", so at this point the data I've accessed came from
something other than my 'order by' and must be reordered at the end.

NOTE: The where clause (WHERE t.external_id IN ('6544', '2234', '2',
'4536')) alone doesn't use an index at all, but results in a sequential
scan (result set is about 60% of the total rows I believe).

I've been unable to get the query planner to pick an index on
(t.mycolumn2, t.id).

2. I reworked the entire query to eliminate the subqueries living all in
joins. Overall the query LOOKS more efficient, but takes about the same
amount of time as the main one because most of the execution time is
done in the two sorting operations.

3. I'm trying to eliminate the union, however I have two problems.
A) I can't figure out how to have an 'or' clause in a single join that
would fetch all the correct rows. If I just do:
LEFT OUTER JOIN table2 t2 ON (t2.real_id = t.id OR t2.real_id =
t.backup_id), I end up with many less rows than the original query. B.

I believe the issue with this is a row could have one of three
possibilities:
* part of the first query but not the second -> results in 1 row after
the union
* part of the second query but not the first -> results in 1 row after
the union
* part of the first query and the second -> results in 2 rows after the
union (see 'B)' for why)

B) the third and fourth column in the SELECT will need to be different
depending on what column the row is joined on in the LEFT OUTER JOIN to
table2, so I may need some expensive case when logic to filter what is
put there based on whether that row came from the first join clause, or
the second.

Any thoughts or things I'm looking over? Any help would be greatly
appreciated. My first goal is to get rid of the sort by the UNION, if
possible. The second would be to eliminate the last sort by the ORDER
BY, but I'm not sure if it will be easily doable.

Thanks,
- Brian F

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2013-07-12 00:46:18 Re: Trying to eliminate union and sort
Previous Message Abhijit Menon-Sen 2013-07-11 05:09:58 Re: [PERFORM] In progress INSERT wrecks plans on table