Re: Using quicksort for every external sort run

From: Feng Tian <ftian(at)vitessedata(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Using quicksort for every external sort run
Date: 2015-08-20 20:28:27
Message-ID: CAFWGqnvHRDFxvxjjEgvnsTrgU3rLPN8W17O7Y-UbrCGAYka=mA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 20, 2015 at 1:16 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:

> On Thu, Aug 20, 2015 at 12:42 PM, Feng Tian <ftian(at)vitessedata(dot)com> wrote:
> > Just a quick anecdotal evidence. I did similar experiment about three
> years
> > ago. The conclusion was that if you have SSD, just do quick sort and
> > forget the longer runs, but if you are using hard drives, longer runs is
> the
> > winner (and safer, to avoid cliffs). I did not experiment with
> RAID0/5 on
> > many spindles though.
> >
> > Not limited to sort, more generally, SSD is different enough from HDD,
> > therefore it may worth the effort for backend to "guess" what storage
> device
> > it has, then choose the right thing to do.
>
> The devil is in the details. I cannot really comment on such a general
> statement.
>
> I would be willing to believe that that's true under
> unrealistic/unrepresentative conditions. Specifically, when multiple
> passes are required with a sort-merge strategy where that isn't the
> case with replacement selection. This could happen with a tiny
> work_mem setting (tiny in an absolute sense more than a relative
> sense). With an HDD, where sequential I/O is so much faster, this
> could be enough to make replacement selection win, just as it would
> have in the 1970s with magnetic tapes.
>
> As I've said, the solution is to simply avoid multiple passes, which
> should be possible in virtually all cases because of the quadratic
> growth in a classic hybrid sort-merge strategy's capacity to avoid
> multiple passes (growth relative to work_mem's growth). Once you
> ensure that, then you probably have a mostly I/O bound workload, which
> can be made faster by adding sequential I/O capacity (or, on the
> Postgres internals side, adding asynchronous I/O, or with memory
> prefetching). You cannot really buy a faster CPU to make a degenerate
> heapsort faster.
>
> --
> Peter Geoghegan
>

Agree everything in principal,except one thing -- no, random IO on HDD in
2010s (relative to CPU/Memory/SSD), is not any faster than tape in 1970s.
:-)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-08-20 20:31:16 Re: Using quicksort for every external sort run
Previous Message Peter Geoghegan 2015-08-20 20:16:30 Re: Using quicksort for every external sort run