Re: Cost model for parallel CREATE INDEX

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cost model for parallel CREATE INDEX
Date: 2017-03-06 00:14:03
Message-ID: CAH2-Wz=H8Xkb36kS+0WuTN4hS=z53+xj+x5G15jmPjfiUx7-7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 4, 2017 at 2:15 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> So, I agree with Robert that we should actually use heap size for the
> main, initial determination of # of workers to use, but we still need
> to estimate the size of the final index [1], to let the cost model cap
> the initial determination when maintenance_work_mem is just too low.
> (This cap will rarely be applied in practice, as I said.)
>
> [1] https://wiki.postgresql.org/wiki/Parallel_External_Sort#bt_estimated_nblocks.28.29_function_in_pageinspect

Having looked at it some more, this no longer seems worthwhile. In the
next revision, I will add a backstop that limits the use of
parallelism based on a lack of maintenance_work_mem in a simpler
manner. Namely, the worker will have to be left with a
maintenance_work_mem/nworkers share of no less than 32MB in order for
parallel CREATE INDEX to proceed. There doesn't seem to be any great
reason to bring the volume of data to be sorted into it.

I expect the cost model to be significantly simplified in the next
revision in other ways, too. There will be no new index storage
parameter, nor a disable_parallelddl GUC. compute_parallel_worker()
will be called in a fairly straightforward way within
plan_create_index_workers(), using heap blocks, as agreed to already.
pg_restore will avoid parallelism (that will happen by setting
"max_parallel_workers_maintenance = 0" when it runs), not because it
cannot trust the cost model, but because it prefers to parallelize
things its own way (with multiple restore jobs), and because execution
speed may not be the top priority for pg_restore, unlike a live
production system.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-03-06 00:51:17 Re: dropping partitioned tables without CASCADE
Previous Message Andreas Karlsson 2017-03-06 00:13:46 Re: REINDEX CONCURRENTLY 2.0