Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date: 2018-01-10 03:12:38
Message-ID: CAH2-Wz=KRxae5sShmsBh7vmXgimQfnQVpT+=ftEWGq7oFU8NUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 8, 2018 at 9:44 PM, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com> wrote:
> I gone through the changes and perform the basic testing. Changes
> looks good and haven't found any unusual during testing

Then I'll mark the patch "Ready for Committer" now. I think that we've
done just about all we can with it.

There is one lingering concern that I cannot shake, which stems from
the fact that the cost model (plan_create_index_workers()) follows the
same generic logic for adding workers as parallel sequential scan, per
Robert's feedback from around March of last year (that is, we more or
less just reuse compute_parallel_worker()). My specific concern is
that this approach may be too aggressive in situations where a
parallel external sort ends up being used instead of a serial internal
sort. No weight is given to any extra temp file costs; a serial
external sort is, in a sense, the baseline, including in cases where
the table is very small and an external sort can actually easily be
avoided iff we do a serial sort.

This is probably not worth doing anything about. The distinction
between internal and external sorts became rather blurred in 9.6 and
10, which, in a way, this patch builds on. If what I describe is a
problem at all, it will very probably only be a problem on small
CREATE INDEX operations, where linear sequential I/O costs are not
already dwarfed by the linearithmic CPU costs. (The dominance of
CPU/comparison costs on larger sorts is the main reason why external
sorts can be faster than internal sorts -- this happens fairly
frequently these days, especially with CREATE INDEX, where being able
to write out the index as it merges on-the-fly helps a lot.)

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-01-10 03:36:20 Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Previous Message Michael Paquier 2018-01-10 02:55:09 Re: BUG #14941: Vacuum crashes