Re: Using quicksort for every external sort run

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Subject: Re: Using quicksort for every external sort run
Date: 2016-04-07 18:05:11
Message-ID: CA+TgmoZbwm-hpu2dUWKD7JZGTc=ByB9VDK0O0LiAmbPRQ8a5KQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 21, 2016 at 2:01 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Thu, Mar 17, 2016 at 1:13 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> OK, I have now committed 0001
>
> I attach a revision of the external quicksort patch and supplementary
> small patches, rebased on top of the master branch.

I spent some time today reading through the new 0001 and in general I
think it looks pretty good. But I think that there is some stuff in
there that logically seems to me to deserve to be separate patches.
In particular:

1. Changing cost_sort to consider disk access as 90% sequential, 10%
random rather than 75% sequential, 25% random. As far as I can recall
from the thread, zero test results have been posted to demonstrate
that this is a good idea. It also seems completely unprincipled. If
the cost of sorts decreases as a result of this patch, it is because
we've reduced the CPU cost, not the I/O cost. The changes we're
talking about here make I/O more random, not less random, because we
will now have more tapes, not fewer; which means merges will have to
seek the disk head more frequently, not less frequently. Now, it's
tempting to say that this patch should result in some change to the
cost model: if the patch doesn't make sorting faster, we shouldn't
commit it at all, and if it does, then surely the cost model should
change accordingly. But the question for the cost model isn't whether
the change to the model somehow reflects the increase in execution
speed. It's whether we get better query plans with the change than
without. I don't think there's been a degree of review of that aspect
of this patch on list that would give me confidence to commit a change
like this.

2. Restricting the maximum number of tapes to 500. This seems like a
sound change and I don't object to it in theory. But I've seen no
benchmark results which demonstrate that this is a good idea, and it
is quite separate from the core purpose of the patch.

Since time is short, I recommend we remove both of these things from
the patch and you can resubmit them as separate patches later. As far
as I can see, neither of them is so tied into the rest of the patch
that the main part of the patch can't be committed without those
changes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-04-07 18:10:58 Re: Using quicksort for every external sort run
Previous Message Tom Lane 2016-04-07 18:03:03 Re: Re: [COMMITTERS] pgsql: Add trigonometric functions that work in degrees.