BUG #5294: Sorts on more than just the order-by clause

From: "Allen Johnson" <akjohnson78(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5294: Sorts on more than just the order-by clause
Date: 2010-01-21 16:19:51
Message-ID: 201001211619.o0LGJp5I043918@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5294
Logged by: Allen Johnson
Email address: akjohnson78(at)gmail(dot)com
PostgreSQL version: 8.4.2
Operating system: Red Hat ES 5.4
Description: Sorts on more than just the order-by clause
Details:

I've been porting our app from Oracle to Postgres and keeping an eye on
performance. Largely, Postgres is performing just as well or better!

I did run into an issue where we are performing a group-by on about 10
columns followed by an order-by of about 5 columns. This query was taking
twice as long as Oracle. When looking at the explain plan, Postgres seems to
be using all the columns in the group-by for sorting instead of _only_ using
what is in the order-by.

While the results are correct this seems to be causing a performance problem
since Postgres is sorting on more columns than it is being asked to. I
reworked the query to get rid of the extra sorting columns but this seems
like a hack. Below is an example query followed by the ugly hack.

Note: The execution times in this example don't mean anything because they
are running on a blank test db. On the production database there was a huge
difference in execution time in favor of the hack query. I just wanted to
illustrate that the sort keys seem incorrect.

Example Query:

select
ct.name, c.lname, c.fname, c.mname,
c.email, c.address1, c.address2,
c.city, c.state, c.zip_code,
count(a.id)
from
contacts c
inner join contact_types ct on (ct.code = c.contact_type_code)
left join attachments a on (a.contact_id = c.id)
where
c.company_id = 1
group by
ct.name, c.lname, c.fname, c.mname,
c.email, c.address1, c.address2,
c.city, c.state, c.zip_code
order by
ct.name, c.lname, c.fname, c.mname;

Example Explain:
QUERY
PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------

GroupAggregate (cost=27.98..28.11 rows=3 width=1864) (actual
time=0.037..0.037 rows=0 loops=1)
-> Sort (cost=27.98..27.99 rows=3 width=1864) (actual time=0.035..0.035
rows=0 loops=1)
Sort Key: ct.name, c.lname, c.fname, c.mname, c.email, c.address1,
c.address2, c.city, c.state, c.zip_code
Sort Method: quicksort Memory: 17kB
-> Nested Loop Left Join (cost=4.27..27.96 rows=3 width=1864)
(actual time=0.017..0.017 rows=0 loops=1)
-> Nested Loop (cost=0.00..16.55 rows=1 width=1864) (actual
time=0.014..0.014 rows=0 loops=1)
-> Index Scan using contacts_company on contacts c
(cost=0.00..8.27 rows=1 width=1760) (actual time=0.012..0.012 rows=0
loops=1)
Index Cond: (company_id = 1)
-> Index Scan using contact_types_pkey on
contact_types ct (cost=0.00..8.27 rows=1 width=152) (never executed)
Index Cond: ((ct.code)::text =
(c.contact_type_code)::text)
-> Bitmap Heap Scan on attachments a (cost=4.27..11.37
rows=3 width=12) (never executed)
Recheck Cond: (a.contact_id = c.id)
-> Bitmap Index Scan on attachments_contact
(cost=0.00..4.27 rows=3 width=0) (never executed)
Index Cond: (a.contact_id = c.id)
Total runtime: 0.192 ms
(15 rows)

* Notice how the sort key is using many more columns than the order-by has
specified.

Hack Query:
select * from (
select
ct.name as ct_name, c.lname, c.fname, c.mname,
c.email, c.address1, c.address2,
c.city, c.state, c.zip_code,
count(a.id)
from
contacts c
inner join contact_types ct on (ct.code = c.contact_type_code)
left join attachments a on (a.contact_id = c.id)
where
c.company_id = 1
group by
ct.name, c.lname, c.fname, c.mname,
c.email, c.address1, c.address2,
c.city, c.state, c.zip_code
) as results
order by
ct_name, lname, fname, mname;

Hack Explain:
QUERY
PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------

Sort (cost=28.13..28.14 rows=3 width=1868) (actual time=0.054..0.054
rows=0 loops=1)
Sort Key: ct.name, c.lname, c.fname, c.mname
Sort Method: quicksort Memory: 17kB
-> HashAggregate (cost=28.04..28.08 rows=3 width=1864) (actual
time=0.021..0.021 rows=0 loops=1)
-> Nested Loop Left Join (cost=4.27..27.96 rows=3 width=1864)
(actual time=0.018..0.018 rows=0 loops=1)
-> Nested Loop (cost=0.00..16.55 rows=1 width=1864) (actual
time=0.016..0.016 rows=0 loops=1)
-> Index Scan using contacts_company on contacts c
(cost=0.00..8.27 rows=1 width=1760) (actual time=0.013..0.013 rows=0
loops=1)
Index Cond: (company_id = 1)
-> Index Scan using contact_types_pkey on
contact_types ct (cost=0.00..8.27 rows=1 width=152) (never executed)
Index Cond: ((ct.code)::text =
(c.contact_type_code)::text)
-> Bitmap Heap Scan on attachments a (cost=4.27..11.37
rows=3 width=12) (never executed)
Recheck Cond: (a.contact_id = c.id)
-> Bitmap Index Scan on attachments_contact
(cost=0.00..4.27 rows=3 width=0) (never executed)
Index Cond: (a.contact_id = c.id)
Total runtime: 0.259 ms
(15 rows)

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Dave Page 2010-01-21 16:30:37 Re: BUG #5292: Corrupted installer
Previous Message Kevin Grittner 2010-01-21 16:15:29 Re: BUG #5292: Corrupted installer