Re: Using quicksort for every external sort run

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Using quicksort for every external sort run
Date: 2016-04-03 12:24:53
Message-ID: 27716590-56b5-baf0-71ee-d27f309c1acc@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

So, let me sum this up, the way I understand the current status.

1) overall, the patch seems to be a clear performance improvement

There's far more "green" cells than "red" ones in the spreadsheets, and
the patch often shaves off 30-75% of the sort duration. Improvements are
pretty much all over the board, for all data sets (low/high/unique
cardinality, initial ordering) and data types.

2) it's unlikely we can improve the performance further

The regressions are limited to low work_mem settings, which we believe
are not representative (or at least not as much as the higher work_mem
values), for two main reasons.

Firstly, if you need to sort a lot of data (e.g. 10M, as benchmarked),
it's quite reasonable to use larger work_mem values. It'd be a bit
backwards to reject a patch that gets you 2-4x speedup with enough
memory, on the grounds that it may have negative impact with
unreasonably small work_mem values.

Secondly, master is faster only if there's enough on-CPU cache for the
replacement sort (for the memtuples heap), but the benchmark is not
realistic in this respect as it only ran 1 query at a time, so it used
the whole cache (6MB for i5, 12MB for Xeon).

In reality there will be multiple processes running at the same time
(e.g backends when running parallel query), significantly reducing the
amount of cache per process, making the replacement sort inefficient and
thus eliminating the regressions (by making the master slower).

3) replacement_sort_mem GUC

I'm not quite sure what's the plan with this GUC. It was useful for
development, but it seems to me it's pretty difficult to tune it in
practice (especially if you don't know the internals, which users
generally don't).

The current patch includes the new GUC right next to work_mem, which
seems rather unfortunate - I do expect users to simply mess with
assuming "more is better" which seems to be rather poor idea.

So I think we should either remove the GUC entirely, or move it to the
developer section next to trace_sort (and removing it from the conf).

I'm wondering whether 16MB default is not a bit too much, actually. As
explained before, that's not the amount of cache we should expect per
process, so maybe ~2-4MB would be a better default value?

Also, not what I'm re-reading the docs for the GUC, I realize it also
depends on how the input data is correlated - that seems like a rather
useless criteria for tuning, though, because that varies per sort node,
so using that for a GUC value set in postgresql.conf does not seem very
wise. Actually even on per-query basis that's rather dubious, as it
depends on how the sort node gets data (some nodes preserve ordering,
some don't).

BTW couldn't we tune the value automatically for each sort, using the
pg_stats.correlation for the sort keys, when available (increasing the
replacement_sort_mem when correlation is close to 1.0)? Wouldn't that
improve at least some of the regressions?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-04-03 12:27:17 Re: remove wal_level archive
Previous Message Dilip Kumar 2016-04-03 11:17:49 Re: Move PinBuffer and UnpinBuffer to atomics