Re: Why is sorting on two columns so slower than sortingon one column?

From: "Li Jie" <jay23jack(at)gmail(dot)com>
To: "Kenneth Marshall" <ktm(at)rice(dot)edu>
Cc: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is sorting on two columns so slower than sortingon one column?
Date: 2010-12-23 14:19:46
Message-ID: 000c01cba2ac$757caa10$2ad118ac@A0078508
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Ken,

Thanks for your tips! Yes it is the case, and I run another query sorting on the second column whose values are random.

postgres=# explain analyze select * from big_wf order by id;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual time=25681.875..36458.824 rows=4100000 loops=1)
Sort Key: id
Sort Method: external merge Disk: 72048kB
-> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8) (actual time=8.595..5569.500 rows=4100000 loops=1)

Now the sorting takes about 20 seconds, so it seems reasonable compared to 30 seconds, right? But one thing I'm confused is that, why is additional comparison really so expensive? Does it incur additional I/O? From the cost model, it seems not, all the "cost" are the same (575775.45).

Thanks,
Li Jie

----- Original Message -----
From: "Kenneth Marshall" <ktm(at)rice(dot)edu>
To: "Jie Li" <jay23jack(at)gmail(dot)com>
Cc: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Sent: Thursday, December 23, 2010 10:03 PM
Subject: Re: [HACKERS] Why is sorting on two columns so slower than sortingon one column?

> On Thu, Dec 23, 2010 at 02:33:12AM -0500, Jie Li wrote:
>> Hi,
>>
>> Here is the test table,
>>
>> postgres=# \d big_wf
>> Table "public.big_wf"
>> Column | Type | Modifiers
>> --------+---------+-----------
>> age | integer |
>> id | integer |
>>
>> postgres=# \dt+ big_wf
>> List of relations
>> Schema | Name | Type | Owner | Size | Description
>> --------+--------+-------+----------+--------+-------------
>> public | big_wf | table | workshop | 142 MB |
>>
>>
>> The first query sorting on one column:
>> postgres=# explain analyze select * from big_wf order by age;
>> QUERY
>> PLAN
>> -------------------------------------------------------------------------------------------------------------------------
>> Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual
>> time=11228.155..16427.149 rows=4100000 loops=1)
>> Sort Key: age
>> Sort Method: external sort Disk: 72112kB
>> -> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8)
>> (actual time=6.196..4797.620 rows=4100000 loops=1)
>> Total runtime: 19530.452 ms
>> (5 rows)
>>
>> The second query sorting on two columns:
>> postgres=# explain analyze select * from big_wf order by age,id;
>> QUERY
>> PLAN
>> -------------------------------------------------------------------------------------------------------------------------
>> Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual
>> time=37544.779..48206.702 rows=4100000 loops=1)
>> Sort Key: age, id
>> Sort Method: external merge Disk: 72048kB
>> -> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8)
>> (actual time=6.796..5518.663 rows=4100000 loops=1)
>> Total runtime: 51258.000 ms
>> (5 rows)
>>
>> The verision is 9.0.1 and the work_mem is 20MB. One special thing is, the
>> first column(age) of all the tuples are of the same value, so the second
>> column(id) is always needed for comparison. While the first sorting takes
>> about only 6 seconds, the second one takes over 30 seconds, Is this too
>> much than expected? Is there any possible optimization ?
>>
>> Thanks,
>> Li Jie
>
> Hi Li,
>
> If I understand your description, in the first query the sort does
> not actually have to do anything because the column values for "age"
> are all degenerate. In the second query, you actually need to sort
> the values which is why it takes longer. If the first column values
> are the same, then simply sorting by "id" alone would be faster.
> You could also bump up work_mem for the query to perform the sort
> in memory.
>
> Regards,
> Ken
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Quan Zongliang 2010-12-23 14:26:29 Re: Patch BUG #5103: "pg_ctl -w (re)start" fails with custom unix_socket_directory
Previous Message Marti Raudsepp 2010-12-23 14:17:51 Re: Why is sorting on two columns so slower than sorting on one column?