Re: BUG #5294: Sorts on more than just the order-by clause

From: Allen Johnson <akjohnson78(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5294: Sorts on more than just the order-by clause
Date: 2010-01-22 21:07:39
Message-ID: 6786ed4f1001221307g3f34f41cn7a1aa8113886dbd2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Ok, I've generated a test database with:
* 20,000 users
* 250,000 contacts
* 1,124,700 attachments

The summary of the results is that the normal query takes about 32sec
on my machine. The hack query takes about 13sec.

Below are the queries and their `explain analyze` outputs. I've
attached the test layout as well as a zip file containing the ruby
scripts that generate the data in the default format that the 'copy'
command expects.

If anyone else wants to give it a try this is the procedure. I wrote
these scripts in a hurry so I'm sure there could be many improvements
:)

1. Apply the tables.sql file to a test database (might want to run the
create index commands after the data load)
2. Create the data files using scripts in the create-scripts.zip like this:
ruby create-user.rb ; creates /tmp/users
ruby create-contacts.rb ; creates /tmp/contacts
ruby create-attachments.rb ; creates /tmp/attachments
3. Load data into the test database
copy users from '/tmp/users';
copy contacts from '/tmp/contacts';
copy attachments from '/tmp/attachments';
4. Run the `create index` statements in the tables.sql file

I also have a pg_dump version if anyone wants it (~6MB gzipped).

/* NORMAL QUERY */
select
users.id, users.full_name, users.username,
ct.name as type_name,
c.lname, c.fname, c.mname, c.email,
c.address1, c.city, c.state, c.zip_code,
c.created_date,
count(a.id) as attachment_count

from
contacts c
inner join users on (users.id = c.user_id)
inner join contact_types ct on (ct.code = c.contact_type_code)
left join attachments a on (a.contact_id = c.id)

where
users.company_id = 1
and c.contact_type_code in ('BOSS', 'EMP', 'WORK')

group by
users.id, users.full_name, users.username,
ct.name, c.lname, c.fname, c.mname, c.email,
c.address1, c.city, c.state, c.zip_code,
c.created_date

order by
users.full_name,
ct.name, c.lname, c.fname, c.mname,
c.created_date

EXPLAIN ANALYZE OUTPUT:

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=102724.80..122894.80 rows=403400 width=124)
(actual time=30683.912..32431.153 rows=75228 loops=1)
-> Sort (cost=102724.80..103733.30 rows=403400 width=124) (actual
time=30683.869..31236.760 rows=345543 loops=1)
Sort Key: users.full_name, ct.name, c.lname, c.fname,
c.mname, c.created_date, users.id, users.username, c.email,
c.address1, c.city, c.state, c.zip_code
Sort Method: quicksort Memory: 96578kB
-> Merge Right Join (cost=16571.91..65164.53 rows=403400
width=124) (actual time=1946.609..7523.831 rows=345543 loops=1)
Merge Cond: (a.contact_id = c.id)
-> Index Scan using attachments_contact on attachments
a (cost=0.00..39729.87 rows=1124700 width=12) (actual
time=0.082..2150.242 rows=1124662 loops=1)
-> Sort (cost=16571.91..16732.30 rows=64157
width=124) (actual time=1946.463..2384.232 rows=345537 loops=1)
Sort Key: c.id
Sort Method: quicksort Memory: 21439kB
-> Hash Join (cost=553.92..11449.19 rows=64157
width=124) (actual time=81.120..1584.615 rows=75228 loops=1)
Hash Cond: ((c.contact_type_code)::text =
(ct.code)::text)
-> Hash Join (cost=552.81..10565.92
rows=64157 width=121) (actual time=81.063..1285.727 rows=75228
loops=1)
Hash Cond: (c.user_id = users.id)
-> Seq Scan on contacts c
(cost=0.00..7534.50 rows=122469 width=96) (actual time=0.023..526.259
rows=150077 loops=1)
Filter:
((contact_type_code)::text = ANY ('{BOSS,EMP,WORK}'::text[]))
-> Hash (cost=427.00..427.00
rows=10065 width=33) (actual time=80.974..80.974 rows=10065 loops=1)
-> Seq Scan on users
(cost=0.00..427.00 rows=10065 width=33) (actual time=0.022..37.797
rows=10065 loops=1)
Filter: (company_id = 1)
-> Hash (cost=1.05..1.05 rows=5 width=12)
(actual time=0.037..0.037 rows=5 loops=1)
-> Seq Scan on contact_types ct
(cost=0.00..1.05 rows=5 width=12) (actual time=0.018..0.024 rows=5
loops=1)
Total runtime: 32551.132 ms
(22 rows)

/* HACK QUERY */
select * from (
select
users.id, users.full_name, users.username,
ct.name as type_name,
c.lname, c.fname, c.mname, c.email,
c.address1, c.city, c.state, c.zip_code,
c.created_date,
count(a.id) as attachment_count

from
contacts c
inner join users on (users.id = c.user_id)
inner join contact_types ct on (ct.code = c.contact_type_code)
left join attachments a on (a.contact_id = c.id)

where
users.company_id = 1
and c.contact_type_code in ('BOSS', 'EMP', 'WORK')

group by
users.id, users.full_name, users.username,
ct.name, c.lname, c.fname, c.mname, c.email,
c.address1, c.city, c.state, c.zip_code,
c.created_date
) as results
order by
full_name,
type_name, lname, fname, mname,
created_date

/* EXPLAIN ANALYZE OUTPUT */

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=420987.80..421996.30 rows=403400 width=1688) (actual
time=12579.266..12663.383 rows=75228 loops=1)
Sort Key: users.full_name, ct.name, c.lname, c.fname, c.mname, c.created_date
Sort Method: quicksort Memory: 21435kB
-> HashAggregate (cost=79283.53..84326.03 rows=403400 width=124)
(actual time=9546.773..9721.322 rows=75228 loops=1)
-> Merge Right Join (cost=16571.91..65164.53 rows=403400
width=124) (actual time=1857.597..7428.718 rows=345543 loops=1)
Merge Cond: (a.contact_id = c.id)
-> Index Scan using attachments_contact on attachments
a (cost=0.00..39729.87 rows=1124700 width=12) (actual
time=0.862..2298.099 rows=1124662 loops=1)
-> Sort (cost=16571.91..16732.30 rows=64157
width=124) (actual time=1856.666..2279.646 rows=345537 loops=1)
Sort Key: c.id
Sort Method: quicksort Memory: 21439kB
-> Hash Join (cost=553.92..11449.19 rows=64157
width=124) (actual time=54.465..1500.539 rows=75228 loops=1)
Hash Cond: ((c.contact_type_code)::text =
(ct.code)::text)
-> Hash Join (cost=552.81..10565.92
rows=64157 width=121) (actual time=54.375..1210.697 rows=75228
loops=1)
Hash Cond: (c.user_id = users.id)
-> Seq Scan on contacts c
(cost=0.00..7534.50 rows=122469 width=96) (actual time=0.035..492.531
rows=150077 loops=1)
Filter:
((contact_type_code)::text = ANY ('{BOSS,EMP,WORK}'::text[]))
-> Hash (cost=427.00..427.00
rows=10065 width=33) (actual time=54.277..54.277 rows=10065 loops=1)
-> Seq Scan on users
(cost=0.00..427.00 rows=10065 width=33) (actual time=0.032..27.676
rows=10065 loops=1)
Filter: (company_id = 1)
-> Hash (cost=1.05..1.05 rows=5 width=12)
(actual time=0.055..0.055 rows=5 loops=1)
-> Seq Scan on contact_types ct
(cost=0.00..1.05 rows=5 width=12) (actual time=0.026..0.035 rows=5
loops=1)
Total runtime: 12752.540 ms
(22 rows)

Attachment Content-Type Size
tables.sql application/octet-stream 1.2 KB
create-scripts.zip application/zip 1.7 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2010-01-23 03:55:58 Re: BUG #5294: Sorts on more than just the order-by clause
Previous Message Bruce Patin 2010-01-22 15:26:55 BUG #5295: Function OUT parameters names and data types skewed with IN parameters