Re: Merge algorithms for large numbers of "tapes"

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Dann Corbit <DCorbit(at)connx(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Luke Lonergan <llonergan(at)greenplum(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Merge algorithms for large numbers of "tapes"
Date: 2006-03-09 01:44:08
Message-ID: 20060309014408.GN45250@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 08, 2006 at 03:35:53PM -0800, Dann Corbit wrote:
> I think I did not explain it clearly enough. Suppose that you have a
> set of rows you need to sort. Instead of loading the whole row into
> memory, just load the columns (or parts of columns) that are being
> sorted. I hope that it is more clear now.

The issue is that there is a non-trivial amount of overhead in going
back to disk to get the raw data, and then you have to parse that into a
valid in-memory tuple. A worst-case scenario is if you're sorting all
the data that you've been asked to retrieve, ie:

SELECT a, b, c ... ORDER BY b, a, c;

That case is almost guaranteed to take longer if you try and do it with
just pointers.

But there is the other case:

SELECT a, b, c, big_honking_text_field ... ORDER BY a, b, c;

In this example it's entirely possible that leaving the big_honking
field out of the actual sorting would be a big win. Especially if your
temporary space was on a different set of spindles.

Regarding your suggestion of testing different kinds of sorts, that's
certainly a good idea if it can be done without a huge amount of work
coding each one up. Ultimately, it might make the most sense to support
multiple sort algorithms (at least for now) and let the planner decide
which one to use. That would at least get us a lot more real-world data
than any other method would.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dann Corbit 2006-03-09 02:05:01 Re: Merge algorithms for large numbers of "tapes"
Previous Message Jonah H. Harris 2006-03-09 01:43:39 Re: Merge algorithms for large numbers of "tapes"