Re: Using quicksort for every external sort run

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Greg S <stark(at)mit(dot)edu>
Subject: Re: Using quicksort for every external sort run
Date: 2016-01-29 17:46:47
Message-ID: CAM3SWZR14_4kswgoWLZTkQsAtTCNo4VbjYcSAhbE0MKNOX1m1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 29, 2016 at 9:24 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I feel like this could be data driven. I mean, the cost model is
> based mainly on the tuple width and the size of the SortTuple array.
> So, it should be possible to tests of both algorithms on 32, 64, 96,
> 128, ... byte tuples with a SortTuple array that is 256MB, 512MB,
> 768MB, 1GB, ... Then we can judge how closely the cost model comes to
> mimicking the actual behavior.

You would also need to represent how much of the input actually ended
up being sorted with the heap in each case. Maybe that could be tested
at 50% (bad for "quicksort with spillover"), 25% (better), and 5%
(good).

An alternative approach that might be acceptable is to add a generic,
conservative 90% threshold (so 10% of tuples sorted by heap).

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2016-01-29 20:55:55 Re: Sequence Access Method WIP
Previous Message Thom Brown 2016-01-29 17:43:02 Re: [WIP] Effective storage of duplicates in B-tree index.