Re: Merge algorithms for large numbers of "tapes"

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Luke Lonergan <llonergan(at)greenplum(dot)com>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Dann Corbit <DCorbit(at)connx(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Merge algorithms for large numbers of "tapes"
Date: 2006-03-09 23:48:56
Message-ID: 20060309234856.GP4474@ns.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> "Luke Lonergan" <llonergan(at)greenplum(dot)com> writes:
> > I would only suggest that we replace the existing algorithm with one that
> > will work regardless of (reasonable) memory requirements. Perhaps we can
> > agree that at least 1MB of RAM for external sorting will always be available
> > and proceed from there?
>
> If you can sort indefinitely large amounts of data with 1MB work_mem,
> go for it.

It seems you two are talking past each other and I'm at least slightly
confused. So, I'd like to ask for a bit of clarification and perhaps
that will help everyone.

#1: I'm as much a fan of eliminating unnecessary code as anyone
#2: There have been claims of two-pass improving things 400%
#3: Supposedly two-pass requires on the order of sqrt(total) memory
#4: We have planner statistics to estimate size of total
#5: We have a work_mem limitation for a reason

So, if we get a huge performance increase, what's wrong with:
if [ sqrt(est(total)) <= work_mem ]; then
two-pass-sort();
else
tape-sort();
fi

?

If the performance isn't much different and tape-sort can do it with
less memory then I don't really see any point in removing it.

If the intent is to remove it and then ask for the default work_mem to
be increased- I doubt going about it this way would work very well. :)

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2006-03-09 23:55:42 Re: Proposal for SYNONYMS
Previous Message Alvaro Herrera 2006-03-09 23:22:04 Re: [COMMITTERS] pgsql: Remove Jan Wieck's name from copyrights, and put in standard