Re: Merge algorithms for large numbers of "tapes"

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: "Luke Lonergan" <llonergan(at)greenplum(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Merge algorithms for large numbers of "tapes"
Date: 2006-03-08 20:39:55
Message-ID: D425483C2C5C9F49B5B7A41F8944154757D5E3@postal.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I do not clearly understand the sorting code in PostgreSQL. If I did
have a good grasp of it, I would take a go at improving it.

Here are some suggestions of things that I know work really, really
well:

#1. Two pass merge (none of that silly poly-tape merge goo)

#2. Load ONLY the keys that are to be sorted into memory. Use a
pointer exchange sort, and do not move the physical rows of data at all.

I am pretty sure from this thread that PostgreSQL is not doing #1, and I
have no idea if it is doing #2.

A useful trick:
Since merge is mentioned, I should say something else about merge joins.
If you do not have room to load the sorted keys for bsearch, load every
kth key (where k is computed by sizeof merge_ram / sizeof key_data).
Then, when you have found the block the thing you are looking for by the
"kth key bsearch", bsearch that block.

Now, maybe PostrgeSQL already uses tricks better than these. I don't
know. But if they prove helpful suggestions I will be glad of it.

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of Tom Lane
> Sent: Wednesday, March 08, 2006 12:32 PM
> To: Jim C. Nasby
> Cc: Luke Lonergan; Simon Riggs; pgsql-hackers(at)postgreSQL(dot)org
> Subject: Re: [HACKERS] Merge algorithms for large numbers of "tapes"
>
> "Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> > But do fewer/longer sorted runs translate into not merging back to
disk?
> > I thought that was controlled by if we had to be able to rewind the
> > result set.
>
> A plain SELECT ... ORDER BY doesn't assume that anymore. It is still
> required for some cases such as the input to a merge join, but the
> on-the-fly-final-merge code is going to be used a lot more in 8.2 than
> it was before.
>
> regards, tom lane
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that
your
> message can get through to the mailing list cleanly

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message PFC 2006-03-08 21:23:05 Re: [HACKERS] Interval subtracting
Previous Message Tom Lane 2006-03-08 20:32:09 Re: Merge algorithms for large numbers of "tapes"