Re: Sorting Improvements for 8.4

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Sorting Improvements for 8.4
Date: 2007-12-03 22:35:18
Message-ID: 1196721318.22428.477.camel@dogma.ljc.laika.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2007-12-03 at 20:40 +0000, Gregory Stark wrote:
> So the question is just how many seeks are we doing during sorting. If we're
> doing 0.1% seeks and 99.9% sequential i/o then eliminating the 1% entirely
> (which we can't do) isn't going to speed up seeking all that much. If we're
> doing 20% seeks and can get that down to 10% it might be worthwhile.

It's not just about eliminating seeks, it's about being able to merge
more runs at one time.

If you are merging 10 runs at once, and only two of those runs overlap
and the rest are much greater values, you might be spending 99% of the
time in sequential I/O.

But the point is, we're wasting the memory holding those other 8 runs in
memory (wasting 80% of the memory you're using), so we really could be
merging a lot more than 10 runs at once. This might eliminate stages
from the merge process.

My point is just that "how many seeks are we doing" is not the only
question. We could be doing 99% sequential I/O and still make huge wins.

In reality, of course, the runs aren't going to be disjoint completely,
but they may be partially disjoint. That's where forecasting comes in:
you preread from the tapes you will actually need tuples from soonest.

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Devrim GÜNDÜZ 2007-12-03 22:39:28 Re: Is postgres.gif missing in cvs?
Previous Message Gregory Stark 2007-12-03 22:10:34 Re: Sorting Improvements for 8.4