Re: more parallel query documentation

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: more parallel query documentation
Date: 2016-04-15 20:15:37
Message-ID: 57114BE9.7020707@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/14/16 10:02 PM, Robert Haas wrote:
> As previously threatened, I have written some user documentation for
> parallel query. I put it up here:

Yay! Definitely needed to be written. :)

There should be a section that summarizes the parallel machinery. I
think the most important points are that separate processes are spun up,
that they're limited by max_worker_processes and max_parallel_degree,
and that shared memory queues are used to move data, results and errors
between a regular backend (controlling backend?) and it's workers. The
first section kind-of alludes to this, but it doesn't actually explain
any of it. I think it's OK for the very first section to be a *brief*
tl;dr summary on the basics of turning the feature on, but after that
laying down groundwork knowledge will make the rest of the page much
clearer.

I think the parts that talk about "parallel plan executed with no
workers" are confusing... it almost sounds like the query won't be
executed at all. It'd be better to say something like "executed single
process" or "executed with no parallelism" or similar. Maybe the real
issue is we need to pick a clear term for a non-parallel query and stick
with it. I would also expand the different scenarios into bullets and
explain why parallelism isn't used, like you did right above that. (I
think it's great that you explained *why* parallel plans wouldn't be
generated instead of just listing conditions.)

When describing SeqScan, it would be good to clarify whether
effective_io_concurrency has an effect. (For that matter, does
effective_io_concurrency interact with any of the other parallel settings?)

"Functions must be marked PARALLEL UNSAFE ..., or make persistent
changes to settings." What would be a non-persistent change? SET LOCAL?
(This is another case where it'd be good if we decided on specific
terminology and referenced the definition from the page.)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-04-15 20:19:57 Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.
Previous Message Stephen Frost 2016-04-15 19:27:37 Re: [COMMITTERS] pgsql: Add new catalog called pg_init_privs