Re: [HACKERS] A Better External Sort?

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
Cc: "Luke Lonergan" <llonergan(at)greenplum(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Ron Peacetree" <rjpeace(at)earthlink(dot)net>, <pgsql-hackers(at)postgresql(dot)org>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [HACKERS] A Better External Sort?
Date: 2005-10-01 06:32:40
Message-ID: D425483C2C5C9F49B5B7A41F8944154757D148@postal.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of Tom Lane
> Sent: Friday, September 30, 2005 11:02 PM
> To: Jeffrey W. Baker
> Cc: Luke Lonergan; Josh Berkus; Ron Peacetree; pgsql-
> hackers(at)postgresql(dot)org; pgsql-performance(at)postgresql(dot)org
> Subject: Re: [HACKERS] [PERFORM] A Better External Sort?
>
> "Jeffrey W. Baker" <jwbaker(at)acm(dot)org> writes:
> > I think the largest speedup will be to dump the multiphase merge and
> > merge all tapes in one pass, no matter how large M. Currently M is
> > capped at 6, so a sort of 60GB with 1GB sort memory needs 13 passes
over
> > the tape. It could be done in a single pass heap merge with
N*log(M)
> > comparisons, and, more importantly, far less input and output.
>
> I had more or less despaired of this thread yielding any usable ideas
> :-( but I think you have one here.

I believe I made the exact same suggestion several days ago.

>The reason the current code uses a
> six-way merge is that Knuth's figure 70 (p. 273 of volume 3 first
> edition) shows that there's not much incremental gain from using more
> tapes ... if you are in the regime where number of runs is much
greater
> than number of tape drives. But if you can stay in the regime where
> only one merge pass is needed, that is obviously a win.
>
> I don't believe we can simply legislate that there be only one merge
> pass. That would mean that, if we end up with N runs after the
initial
> run-forming phase, we need to fit N tuples in memory --- no matter how
> large N is, or how small work_mem is. But it seems like a good idea
to
> try to use an N-way merge where N is as large as work_mem will allow.
> We'd not have to decide on the value of N until after we've completed
> the run-forming phase, at which time we've already seen every tuple
> once, and so we can compute a safe value for N as work_mem divided by
> largest_tuple_size. (Tape I/O buffers would have to be counted too
> of course.)

You only need to hold the sort column(s) in memory, except for the queue
you are exhausting at the time. [And of those columns, only the values
for the smallest one in a sub-list.] Of course, the more data from each
list that you can hold at once, the fewer the disk reads and seeks.

Another idea (not sure if it is pertinent):
Instead of having a fixed size for the sort buffers, size it to the
query. Given a total pool of size M, give a percentage according to the
difficulty of the work to perform. So a query with 3 small columns and
a cardinality of 1000 gets a small percentage and a query with 10 GB of
data gets a big percentage of available sort mem.

> It's been a good while since I looked at the sort code, and so I don't
> recall if there are any fundamental reasons for having a compile-time-
> constant value of the merge order rather than choosing it at runtime.
> My guess is that any inefficiencies added by making it variable would
> be well repaid by the potential savings in I/O.
>
> regards, tom lane
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 6: explain analyze is your friend

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Fuhr 2005-10-01 06:40:10 Re: Expression index ignores column statistics target
Previous Message Tom Lane 2005-10-01 06:19:06 Re: Expression index ignores column statistics target

Browse pgsql-performance by date

  From Date Subject
Next Message Simon Riggs 2005-10-01 09:29:52 Re: [HACKERS] A Better External Sort?
Previous Message Tom Lane 2005-10-01 06:01:58 Re: [HACKERS] A Better External Sort?