Re: Using quicksort for every external sort run

From: Greg Stark <stark(at)mit(dot)edu>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Using quicksort for every external sort run
Date: 2015-11-19 20:52:10
Message-ID: CAM-w4HO4yzmuSDdW6hmHyrbEfYa=BmD-Qs_6ccfT+zHny19YXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 19, 2015 at 8:35 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
> Hm. So a bit of back-of-envelope calculation. If we have want to
> buffer at least 1MB for each run -- I think we currently do more
> actually -- and say that a 1GB work_mem ought to be enough to run
> reasonably (that's per sort after all and there might be multiple
> sorts to say nothing of other users on the system). That means we can
> merge about 1,000 runs in the final merge. Each run will be about 2GB
> currently but 1GB if we quicksort the runs. So the largest table we
> can sort in a single pass is 1-2 TB.

For the sake of pedantry I fact checked myself. We calculate the
number of tapes based on wanting to buffer 32 blocks plus overhead so
about 256kB. So the actual maximum you can handle with 1GB of sort_mem
without multiple merges is on the order 4-8TB.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2015-11-19 21:22:28 Re: Bug in numeric multiplication
Previous Message Peter Geoghegan 2015-11-19 20:43:53 Re: Using quicksort for every external sort run