Re: Using quicksort for every external sort run

From: Greg Stark <stark(at)mit(dot)edu>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Using quicksort for every external sort run
Date: 2016-04-02 22:20:24
Message-ID: CAM-w4HNWd6CdGjzmka0x1BHgGKtMp0iYzkF1PGyBzdTVvtV=dw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 2, 2016 at 3:31 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:

> So let me be clear: I do think the patch seems to be a significant
> performance improvement for most of the queries, and I'm OK with accepting a
> few regressions (particularly if we agree those are pathological cases,
> unlikely to happen in real-world workloads).

The ultra-short version of this is:

8MB: 0.98
32MB: 0.79
128MB: 0.63
512MB: 0.51
1GB: 0.42

These are the averages across all queries across all data sets for the
run-time for the patch versus master (not patched 64 which I think is
the replacement_sort_mem=64MB which appears to not be a win). So even
in the less successful cases on average quicksort is faster than
replacement selection.

But selecting just the cases where 8MB is significantly slower than
master it does look like the "padding" data sets are endemic.

On the one hand that's a very realistic use-case where I think a lot
of users find themselves. I know in my days as a web developer I
typically threw a lot of columns into my queries and through a lot of
joins and order by and then left it to the application to pick through
the recordsets that were returned for the columns that were of
interest. The tuples being sorted were probably huge.

On the other hand perhaps this is something better tackled by the
planner. If the planner can arrange sorts to happen when the rows are
narrower that would be a a bigger win than trying to move a lot of
data around like this. (In the extreme if it were possible to replace
unnecessary columns by the tid and then refetching them later though
that's obviously more than a little tricky to do effectively.)

There are also some weird cases in this list where there's a
significant regression at 32MB but not at 8MB. I would like to see
16MB and perhaps 12MB and 24MB. They would help understand if these
are just quirks or there's a consistent pattern.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2016-04-02 22:22:11 Re: Using quicksort for every external sort run
Previous Message Alvaro Herrera 2016-04-02 21:57:48 Re: standalone backend PANICs during recovery