Re: Why is sorting on two columns so slower thansortingon one column?

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Li Jie <jay23jack(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is sorting on two columns so slower thansortingon one column?
Date: 2010-12-23 14:53:05
Message-ID: 20101223145305.GN10252@aart.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 23, 2010 at 10:42:26PM +0800, Li Jie wrote:
> ----- Original Message -----
> From: "Kenneth Marshall" <ktm(at)rice(dot)edu>
> To: "Li Jie" <jay23jack(at)gmail(dot)com>
> Cc: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
> Sent: Thursday, December 23, 2010 10:30 PM
> Subject: Re: [HACKERS] Why is sorting on two columns so slower thansortingon one column?
>
>
> > On Thu, Dec 23, 2010 at 10:19:46PM +0800, Li Jie wrote:
> >> Hi Ken,
> >>
> >> Thanks for your tips! Yes it is the case, and I run another query sorting on the second column whose values are random.
> >>
> >> postgres=# explain analyze select * from big_wf order by id;
> >> QUERY PLAN
> >> -------------------------------------------------------------------------------------------------------------------------
> >> Sort (cost=565525.45..575775.45 rows=4100000 width=8) (actual time=25681.875..36458.824 rows=4100000 loops=1)
> >> Sort Key: id
> >> Sort Method: external merge Disk: 72048kB
> >> -> Seq Scan on big_wf (cost=0.00..59142.00 rows=4100000 width=8) (actual time=8.595..5569.500 rows=4100000 loops=1)
> >>
> >> Now the sorting takes about 20 seconds, so it seems reasonable compared to 30 seconds, right? But one thing I'm confused is that, why is additional comparison really so expensive? Does it incur additional I/O? From the cost model, it seems not, all the "cost" are the same (575775.45).
> >>
> >> Thanks,
> >> Li Jie
> >
> > In the first query, the cost is basically the I/O cost to read the
> > table from disk. The actual sort does not do anything since the
> > sort values are the same. In the second query, the sort has to
> > swap things in memory/disk to get them in the correct order for
> > the result. This actually takes CPU and possibly additional I/O
> > which is why it is slower. In the case of sorting by just the "id"
> > column, the size of the sorted values is smaller which would need
> > fewer batches to complete the sort since the sort is bigger than
> > the work_mem.
> >
> > Cheers,
> > Ken
>
> Hi Ken,
>
> Thanks for your analysis.
>
> But in the last query that sorts on "id", since the query selects all the columns for output, the actual sorted size is the same, and the only difference is the comparison cost. The query sorting on two columns needs to do twice the comparison. Am I right?
>
> Thanks,
> Li Jie

I think you are right. Sorry for the confusion.

Ken

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-12-23 15:15:21 Re: Streaming replication as a separate permissions
Previous Message Li Jie 2010-12-23 14:42:26 Re: Why is sorting on two columns so slower thansortingon one column?