Why is sorting on two columns so slower than sorting on one column?

From: Jie Li <jay23jack(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Why is sorting on two columns so slower than sorting on one column?
Date: 2010-12-23 07:33:12
Message-ID: AANLkTi=VK+kLjEvVdEL7y9P-rPE3OJVzqruR59EPmkj+@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Here is the test table,

postgres=# \d big_wf
Table "public.big_wf"
Column | Type | Modifiers
--------+---------+-----------
age | integer |
id | integer |

postgres=# \dt+ big_wf
List of relations
Schema | Name | Type | Owner | Size | Description
--------+--------+-------+----------+--------+-------------
public | big_wf | table | workshop | 142 MB |

The first query sorting on one column:
postgres=# explain analyze select * from big_wf order by age;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------
Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual
time=11228.155..16427.149 rows=4100000 loops=1)
Sort Key: age
Sort Method: external sort Disk: 72112kB
-> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8)
(actual time=6.196..4797.620 rows=4100000 loops=1)
Total runtime: 19530.452 ms
(5 rows)

The second query sorting on two columns:
postgres=# explain analyze select * from big_wf order by age,id;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------
Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual
time=37544.779..48206.702 rows=4100000 loops=1)
Sort Key: age, id
Sort Method: external merge Disk: 72048kB
-> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8)
(actual time=6.796..5518.663 rows=4100000 loops=1)
Total runtime: 51258.000 ms
(5 rows)

The verision is 9.0.1 and the work_mem is 20MB. One special thing is, the
first column(age) of all the tuples are of the same value, so the second
column(id) is always needed for comparison. While the first sorting takes
about only 6 seconds, the second one takes over 30 seconds, Is this too
much than expected? Is there any possible optimization ?

Thanks,
Li Jie

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2010-12-23 08:10:26 recapitulation: FOREACH-IN-ARRAY
Previous Message Sushant Sinha 2010-12-23 05:35:30 Re: english parser in text search: support for multiple words in the same position