Quick Links

Re: Merge algorithms for large numbers of "tapes"

From:	"Dann Corbit" <DCorbit(at)connx(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Merge algorithms for large numbers of "tapes"
Date:	2006-03-08 20:39:55
Message-ID:	D425483C2C5C9F49B5B7A41F8944154757D5E3@postal.corporate.connx.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I do not clearly understand the sorting code in PostgreSQL. If I did
have a good grasp of it, I would take a go at improving it.

Here are some suggestions of things that I know work really, really
well:

#1. Two pass merge (none of that silly poly-tape merge goo)

#2. Load ONLY the keys that are to be sorted into memory. Use a
pointer exchange sort, and do not move the physical rows of data at all.

I am pretty sure from this thread that PostgreSQL is not doing #1, and I
have no idea if it is doing #2.

A useful trick:
Since merge is mentioned, I should say something else about merge joins.
If you do not have room to load the sorted keys for bsearch, load every
kth key (where k is computed by sizeof merge_ram / sizeof key_data).
Then, when you have found the block the thing you are looking for by the
"kth key bsearch", bsearch that block.

Now, maybe PostrgeSQL already uses tricks better than these. I don't
know. But if they prove helpful suggestions I will be glad of it.

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of Tom Lane
> Sent: Wednesday, March 08, 2006 12:32 PM
> To: Jim C. Nasby
> Cc: Luke Lonergan; Simon Riggs; pgsql-hackers(at)postgreSQL(dot)org
> Subject: Re: [HACKERS] Merge algorithms for large numbers of "tapes"
>
> "Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> > But do fewer/longer sorted runs translate into not merging back to
disk?
> > I thought that was controlled by if we had to be able to rewind the
> > result set.
>
> A plain SELECT ... ORDER BY doesn't assume that anymore. It is still
> required for some cases such as the input to a merge join, but the
> on-the-fly-final-merge code is going to be used a lot more in 8.2 than
> it was before.
>
> regards, tom lane
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that
your
> message can get through to the mailing list cleanly

Responses

Re: Merge algorithms for large numbers of "tapes" at 2006-03-08 21:51:30 from Luke Lonergan
Re: Merge algorithms for large numbers of "tapes" at 2006-03-08 23:17:05 from Tom Lane
Re: Merge algorithms for large numbers of "tapes" at 2006-03-09 02:23:18 from Andrew Dunstan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	PFC	2006-03-08 21:23:05	Re: [HACKERS] Interval subtracting
Previous Message	Tom Lane	2006-03-08 20:32:09	Re: Merge algorithms for large numbers of "tapes"