Re: Sorting Improvements for 8.4

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sorting Improvements for 8.4
Date: 2007-11-30 20:07:33
Message-ID: 1196453253.22428.416.camel@dogma.ljc.laika.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2007-11-27 at 18:03 +0000, Simon Riggs wrote:
> 5. DYNAMIC RUN HANDLING (in Final Merge)
>
> Another way of addressing a) is to simply make better use of memory
> itself. Let's look at that in more detail:
>
> Number of runs that can be merged at once is currently fixed, based upon
> available memory. This has the underlying assumption that all runs will
> be concurrently active during final merging, which may not always be
> true.
>
> If we have random data then almost all runs will overlap with all other
> runs, i.e. the min and max values are sufficiently wide that the runs do
> all overlap. In many cases, data arrives in somewhat sorted order, e.g.
> financial data is fairly regular with some late payers but not many, and
> those trail off with a fairly tight decay. In the somewhat sorted case
> we find that the actual overlap is less than total, so there are many
> later runs that don't overlap the earlier ones. In the best case we
> would find run 1 and 2 overlap, runs 2 and 3 overlap, then 3 and 4
> overlap.

I have spoken with Len Shapiro, a professor at Portland State
University, regarding sorting before.

He suggests that PostgreSQL should implement forecasting, which is
similar to what you're describing. Forecasting does not require that
entire runs are disjoint, it works by tracking the maximum values from
the last block read from every run. This allows you to know which run
you will need more blocks from the soonest.

I'm still looking into the problem to understand it better, but the
algorithm is in Knuth Vol 3.

I can look at it in more detail, but have you already looked into this
idea? Is there a reason we don't do this currently?

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2007-11-30 20:22:07 Re: [HACKERS] [GENERAL] plperl and regexps with accented characters - incompatible?
Previous Message Sam Mason 2007-11-30 19:20:13 Re: PostGreSQL and recursive queries...