Quick Links

Re: Sort Refinement

From:	Sam Mason <sam(at)samason(dot)me(dot)uk>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Sort Refinement
Date:	2008-03-20 21:34:00
Message-ID:	20080320213400.GB26166@frubble.xen.chris-lamb.co.uk
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Mar 20, 2008 at 05:17:22PM +0000, Simon Riggs wrote:
> Currently, our sort algorithm assumes that its input is unsorted. So if
> your data is sorted on (a) and you would like it to be sorted on (a,b)
> then we need to perform the full sort of (a,b).
>
> For small sorts this doesn't matter much. For larger sorts the heap sort
> algorithm will typically result in just a single run being written to
> disk which must then be read back in. Number of I/Os required is twice
> the total volume of data to be sorted.
>
> If we assume we use heap sort, then if we *know* that the data is
> presorted on (a) then we should be able to emit tuples directly that the
> value of (a) changes and keep emitting them until the heap is empty,
> since they will exit the heap in (a,b) order.

We also have stats to help decide when this will be a win. For example
if "a" has a small range (i.e. a boolean) and "b" has a large range
(i.e. some sequence) then this probably isn't going to be a win and
you're better off using the existing infrastructure. If it's the other
way around then this is going to be a big win.

Sam

In response to

Sort Refinement at 2008-03-20 17:17:22 from Simon Riggs

Responses

Re: Sort Refinement at 2008-03-22 13:01:07 from Simon Riggs

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2008-03-20 21:47:34	Re: [PATCHES] [GENERAL] Empty arrays with ARRAY[]
Previous Message	Tom Lane	2008-03-20 20:59:21	Re: Rewriting Free Space Map