Re: Sort performance

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
Cc: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sort performance
Date: 2006-09-01 15:49:17
Message-ID: 87d5afio3m.fsf@stark.enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Luke Lonergan" <LLonergan(at)greenplum(dot)com> writes:

> What version of pgsql?
>
> Recent changes stripped the sort set down considerably in size in external
> sort, I'm not sure the same is done if the data doesn't spill to disk.

This is a recent CVS checkout.

If you're referring to MinimalTuples I think that's done before tuplesort ever
sees the tuples. Besides when swapping things around in memory only the first
datum and a pointer to the rest of the object actually gets moved around. I
think.

Now that I've investigated further I'm even more confused though. The cases
where I'm seeing external sorts outperform internal sorts are when it just
barely exceeds work_mem which means it's only doing one merge pass between
initial tapes generated using inittapes. That means most of the work is
actually being done using in-memory sorts. Guess what algorithm we use to
generate initial tapes: heap sort!

> * See Knuth, volume 3, for more than you want to know about the external
> * sorting algorithm. We divide the input into sorted runs using replacement
> * selection, in the form of a priority tree implemented as a heap
> * (essentially his Algorithm 5.2.3H),

So basically our heap sort implementation is 3x as fast a glibc's qsort
implementation?! Is that believable?

Certainly I don't get results like that if I just change the code to do a heap
sort instead of qsort. I see it being substantially slower.

[aside, that said that may be a useful feature to have: a user option to use
our internal heap sort instead of qsort. I'm thinking of users on platforms
where libc's qsort either performs poorly or is buggy. Since we have all the
code for heap sort there already anyways...]

I feel like I'm missing some extra work tuplesort is doing (possibly
needlessly) in addition to the qsort. Now I'm getting paranoid that perhaps
this is just a bug in my hacked up copy of this code. I can't see how that
could be but I'll try reproducing it with stock CVS Postgres.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Treat 2006-09-01 15:50:36 Re: Thought provoking piece on NetBSD
Previous Message Tom Lane 2006-09-01 15:41:16 Re: [PATCHES] Updatable views