Re: Using quicksort for every external sort run

From: Greg Stark <stark(at)mit(dot)edu>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Using quicksort for every external sort run
Date: 2015-11-25 01:42:13
Message-ID: CAM-w4HPtmmNsRixXSWrbZxSB2=eaahJuQrzkyMeT69H_4M2wog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 25, 2015 at 12:33 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Tue, Nov 24, 2015 at 3:32 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> My feeling is that numbers rarely speak for themselves, without LSD. (Which
>> numbers?)
>
> Guffaw.

Actually I kind of agree. What I would like to see is a series of
numbers for increasing sizes of sorts plotted against the same series
for the existing algorithm. Specifically with the sort size varying to
significantly more than the physical memory on the machine. For
example on a 16GB machine sorting data ranging from 1GB to 128GB.

There's a lot more information in a series of numbers than individual
numbers. We'll be able to see whether all our pontificating about the
rates of growth of costs of different algorithms or which costs
dominate at which scales are actually borne out in reality. And see
where the break points are where I/O overtakes memory costs. And it'll
be clearer where to look for problematic cases where the new algorithm
might not dominate the old one.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-11-25 01:46:04 Re: Revisiting pg_stat_statements and IN() (Was: Re: pg_stat_statements fingerprinting logic and ArrayExpr)
Previous Message Michael Paquier 2015-11-25 01:39:16 Re: WIP: About CMake v2