Re: Using quicksort for every external sort run

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Using quicksort for every external sort run
Date: 2015-12-02 19:26:00
Message-ID: CAM3SWZTDzBFxRig3rsWydxDNnHfg+AJYEiEydtQL0krqVTqKgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 2, 2015 at 10:03 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I'm not very concerned about a regression that is only seen when
>> work_mem is set below the (very conservative) postgresql.conf default
>> value of 4MB when sorting 100 million integers.
>
> Perhaps surprisingly, I tend to agree. I'm cautious of regressions
> here, but large sorts in queries are relatively uncommon. You're
> certainly not going to want to return a 100 million tuples to the
> client.

Right. The fact that it was only a 5% regression is also a big part of
what made me unconcerned. I am glad that we've characterized the
regression that I assumed was there, though -- I certainly knew that
Knuth and so on were not wrong to emphasize increasing run size in the
1970s. Volume 3 of The Art of Computer Programming literally has a
pull-out chart showing the timing of external sorts. This includes the
time it takes for a human operator to switch magnetic tapes, and
rewind those tapes. The underlying technology has changed rather a lot
since, of course.

> While large sorts are uncommon in queries, they are much more common
> in index builds. Therefore, I think we ought to be worrying more
> about regressions at 64MB than at 4MB, because we ship with
> maintenance_work_mem = 64MB and a lot of people probably don't change
> it before trying to build an index. If we make those index builds go
> faster, users will be happy. If we make them go slower, users will be
> sad. So I think it's worth asking the question "are there any CREATE
> INDEX commands that someone might type on a system on which they've
> done no other configuration that will be slower with this patch"?

I certainly agree that that's a good place to focus. I think that it's
far, far less likely that anything will be slowed down when you take
this as a cut-off point. I don't want to overemphasize it, but the
analysis of how many more passes are needed because of lack of a
replacement selection heap (the "quadratic growth" thing) gives me
confidence. A case with less than 4MB of work_mem is where we actually
saw *some* regression.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2015-12-02 19:26:56 Re: broken tests
Previous Message cevian 2015-12-02 19:25:15 Postgres_FDW optimizations