Re: Merge algorithms for large numbers of "tapes"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Dann Corbit" <DCorbit(at)connx(dot)com>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, "Luke Lonergan" <llonergan(at)greenplum(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Merge algorithms for large numbers of "tapes"
Date: 2006-03-08 23:17:05
Message-ID: 18919.1141859825@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Dann Corbit" <DCorbit(at)connx(dot)com> writes:
> Here are some suggestions of things that I know work really, really
> well:
> #1. Two pass merge (none of that silly poly-tape merge goo)

This amounts to an assumption that you have infinite work_mem, in which
case you hardly need an external sort at all. If your work_mem is in
fact finite, then at some point you need more than two passes. I'm not
really interested in ripping out support for sort operations that are
much larger than work_mem.

> #2. Load ONLY the keys that are to be sorted into memory. Use a
> pointer exchange sort, and do not move the physical rows of data at all.

This suggestion isn't a whole lot better; in general the rows to be
sorted don't exist until we compute them, and so proposing that we
"don't load them until later" is pretty much irrelevant. Also, in
a lot of common cases the keys to be sorted are the bulk of the data
anyway.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dann Corbit 2006-03-08 23:35:53 Re: Merge algorithms for large numbers of "tapes"
Previous Message Tom Lane 2006-03-08 22:41:38 Re: [COMMITTERS] pgsql: Remove Christof Petig copyright on