Re: Using quicksort for every external sort run

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Using quicksort for every external sort run
Date: 2015-11-19 20:43:53
Message-ID: CAM3SWZTX5=nHxPpogPirQsH4cR+BpQS6r7Ktax0HMQiNLf-1qA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 18, 2015 at 6:19 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Nov 18, 2015 at 6:29 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> In principle, I have no problem with doing that. Through testing, I
>> cannot see any actual upside, though. Perhaps I just missed something.
>> Even 8MB is enough to avoid the multipass merge in the event of a
>> surprisingly high volume of data (my work laptop is elsewhere, so I
>> don't have my notes on this in front of me, but I figured out the
>> crossover point for a couple of cases).
>
> I'd be interested in seeing this analysis in some detail.

Sure. Jeff mentioned 8MB as a work_mem setting, so let's examine a
case where that's the work_mem setting, and see experimentally where
the crossover point for a multi-pass sort ends up.

If this table is created:

postgres=# create unlogged table bar as select (random() * 1e9)::int4
idx, 'payload xyz'::text payload from generate_series(1, 10100000) i;
SELECT 10100000

Then, on my system, a work_mem setting of 8MB *just about* avoids
seeing the multipass_warning message with this query:

postgres=# select count(distinct idx) from bar ;

count
------------
10,047,433
(1 row)

A work_mem setting of 235MB is just enough to make the query's sort
fully internal.

Let's see how things change with a higher work_mem setting of 16MB. I
mentioned quadratic growth: Having doubled work_mem, let's *quadruple*
the number of tuples, to see where this leaves a 16MB setting WRT a
multi-pass merge:

postgres=# drop table bar ;
DROP TABLE
postgres=# create unlogged table bar as select (random() * 1e9)::int4
idx, 'payload xyz'::text payload from generate_series(1, 10100000 * 4)
i;
SELECT 40400000

Further experiments show that this is the exact point at which the
16MB work_mem setting similarly narrowly avoids a multi-pass warning.
This should be the dominant consideration, because now a fully
internal sort requires 4X the work_mem of my original 16MB work_mem
example table/query.

The quadratic growth in a simple hybrid sort-merge strategy's ability
to avoid a multi-pass merge phase (growth relative to linear increases
in work_mem) can be demonstrated with simple experiments.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2015-11-19 20:52:10 Re: Using quicksort for every external sort run
Previous Message Greg Stark 2015-11-19 20:35:24 Re: Using quicksort for every external sort run