Re: Using quicksort for every external sort run

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Using quicksort for every external sort run
Date: 2016-04-07 13:55:41
Message-ID: CA+TgmoYi9Q8KsVi-q2Wt1B-EsEdX4OTRJ7iWs_ypSVEk8DoXTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry for not responding to this thread again sooner. I was on
vacation Thursday-Sunday, and have been playing catch-up since then.

On Sun, Apr 3, 2016 at 8:24 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> Secondly, master is faster only if there's enough on-CPU cache for the
> replacement sort (for the memtuples heap), but the benchmark is not
> realistic in this respect as it only ran 1 query at a time, so it used the
> whole cache (6MB for i5, 12MB for Xeon).
>
> In reality there will be multiple processes running at the same time (e.g
> backends when running parallel query), significantly reducing the amount of
> cache per process, making the replacement sort inefficient and thus
> eliminating the regressions (by making the master slower).

Interesting point.

> 3) replacement_sort_mem GUC
>
> I'm not quite sure what's the plan with this GUC. It was useful for
> development, but it seems to me it's pretty difficult to tune it in practice
> (especially if you don't know the internals, which users generally don't).
>
> The current patch includes the new GUC right next to work_mem, which seems
> rather unfortunate - I do expect users to simply mess with assuming "more is
> better" which seems to be rather poor idea.
>
> So I think we should either remove the GUC entirely, or move it to the
> developer section next to trace_sort (and removing it from the conf).

I certainly agree that GUCs that aren't easy to tune are bad. I'm
wondering whether the fact that this one is hard to tune is something
that can be fixed. The comments about "padding" - a term I don't
like, because it to me implies a deliberate attempt to game the
benchmark when in reality wanting to sort a wide row is entirely
reasonable - make me wonder if this should be based on a number of
tuples rather than an amount of memory. If considering the row width
makes us get the wrong answer, then let's not do that.

> BTW couldn't we tune the value automatically for each sort, using the
> pg_stats.correlation for the sort keys, when available (increasing the
> replacement_sort_mem when correlation is close to 1.0)? Wouldn't that
> improve at least some of the regressions?

Surely not for 9.6.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2016-04-07 14:26:43 Re: pgbench randomness initialization
Previous Message Andres Freund 2016-04-07 13:51:57 Re: pgbench randomness initialization