Re: assessing parallel-safety

From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: assessing parallel-safety
Date: 2015-02-15 23:24:47
Message-ID: 20150215232447.GA3966927@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

It's important for parallelism to work under the extended query protocol, and
that's nontrivial. exec_parse_message() sets cursorOptions.
exec_bind_message() runs the planner. We don't know if a parallel plan is
okay before seeing max_rows in exec_execute_message().

On Sun, Feb 15, 2015 at 01:29:06AM -0500, Robert Haas wrote:
> - For testing purposes, I set this up so that the executor activates
> parallel mode when running any query that seems to be parallel-safe.
> It seems nearly certain we'd only want to do this when a parallel plan
> was actually selected, but this behavior is awfully useful for
> testing, and revealed a number of bugs in the parallel-safety
> markings.

Why wouldn't that be a good permanent behavior? The testing benefit extends
to users. If I wrongly mark a function something other than
PROPARALLEL_UNSAFE, I'd rather find out ASAP.

> - The patch includes a script, fixpgproc.pl, which I used to insert
> and update the parallel markings on the system catalogs. That is, of
> course, not intended for commit, but it's mighty useful for
> experimentation.

The CREATE OR REPLACE FUNCTION calls in src/backend/catalog/system_views.sql
reset proparallel=u for some built-in functions, such as make_interval.

Some categories of PROPARALLEL_SAFE functions deserve more attention:

- database_to_xml* need to match proparallel of schema_to_xml*, which are
PROPARALLEL_UNSAFE due to the heavyweight lock rule. cursor_to_xml*
probably need to do likewise.

- Several functions, such as array_in and anytextcat, can call arbitrary I/O
functions. This implies a policy that I/O functions must be
PROPARALLEL_SAFE. I agree with that decision, but it should be documented.
Several json-related functions can invoke X->json cast functions; the
marking implies that all such cast functions must be PROPARALLEL_SAFE.
That's probably okay, too.

- All selectivity estimator functions are PROPARALLEL_SAFE. That implies, for
example, that any operator using eqjoinsel ought to be PROPARALLEL_SAFE or
define its own restriction selectivity estimator function. On the other
hand, the planner need not check the parallel safety of selectivity
estimator functions. If we're already in parallel mode at the moment we
enter the planner, we must be planning a query called within a
PROPARALLEL_SAFE function. If the query uses an unsafe operator, the
containing function was mismarked.

- inet_{client,server}_{addr,port} are PROPARALLEL_SAFE; will the information
they need indeed be available in workers?

- Assuming you don't want to propagate XactLastRecEnd from the slave back to
the master, restrict XLogInsert() during parallel mode. Restrict functions
that call it, including pg_create_restore_point, pg_switch_xlog and
pg_stop_backup.

- pg_extension_config_dump modifies the database, so it shall be
PROPARALLEL_UNSAFE. Assuming pg_stat_clear_snapshot won't reach back to
the master to clear the snapshot, it shall be PROPARALLEL_RESTRICTED.

If a mismarked function elicits "ERROR: cannot retain locks acquired while in
parallel mode", innocent queries later in the transaction get the same error:

begin;
select 1; -- ok
savepoint q; select database_to_xml(true, true, 'x'); rollback to q; -- ERROR
select 1; -- ERROR
rollback;

> --- a/src/backend/commands/typecmds.c
> +++ b/src/backend/commands/typecmds.c
> @@ -1610,6 +1610,7 @@ makeRangeConstructors(const char *name, Oid namespace,
> false, /* leakproof */
> false, /* isStrict */
> PROVOLATILE_IMMUTABLE, /* volatility */
> + PROPARALLEL_UNSAFE, /* parallel safety */

Range constructors are PROPARALLEL_SAFE.

> --- a/src/backend/executor/functions.c
> +++ b/src/backend/executor/functions.c
> @@ -496,7 +496,9 @@ init_execution_state(List *queryTree_list,
> if (queryTree->commandType == CMD_UTILITY)
> stmt = queryTree->utilityStmt;
> else
> - stmt = (Node *) pg_plan_query(queryTree, 0, NULL);
> + stmt = (Node *) pg_plan_query(queryTree,
> + fcache->readonly_func ? CURSOR_OPT_PARALLEL_OK : 0,
> + NULL);

(This code is deciding whether to allow parallelism in one statement of an
SQL-language function.) Why does it care if the function is read-only?

I see that PL/pgSQL never allows parallelism in queries it launches.

> --- a/src/include/utils/lsyscache.h
> +++ b/src/include/utils/lsyscache.h
> @@ -79,6 +79,7 @@ extern bool op_mergejoinable(Oid opno, Oid inputtype);
> extern bool op_hashjoinable(Oid opno, Oid inputtype);
> extern bool op_strict(Oid opno);
> extern char op_volatile(Oid opno);
> +extern char op_parallel(Oid opno);

No such function defined.

In passing, I noticed a few cosmetic problems in the prerequisite
parallel-mode-v6.patch. Typos: initating Musn't paralell paralellism
paralllelism processe. Capitalization of "transactionIdPrecedes". When
applying the patch, this:

$ git apply ../rev/parallel-mode-v6.patch
../rev/parallel-mode-v6.patch:2897: indent with spaces.
(nLocksAcquiredInParallelMode - n));
warning: 1 line adds whitespace errors.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2015-02-15 23:31:26 Issue installing doc tools on OSX
Previous Message Christoph Berg 2015-02-15 21:56:37 Re: gcc5: initdb produces gigabytes of _fsm files