Quick Links

Re: parallel_safe

From:	Andy Fan <zhihuifan1213(at)163(dot)com>
To:	Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: parallel_safe
Date:	2025-05-23 00:47:28
Message-ID:	87v7psqhjz.fsf@163.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Andy Fan <zhihuifan1213(at)163(dot)com> writes:

Hi,

Some clearer idea are provided below. Any feedback which could tell this
is *obviously wrong* or *not obviously wrong* is welcome.

> see the below example:
>
> create table bigt (a int, b int, c int);
> insert into bigt select i, i, i from generate_series(1, 1000000)i;
> analyze bigt;
>
> select * from bigt o where b = 1 and c = (select sum(c) from bigt i where c = o.c);
..
> I think the below plan should be correct and more efficiently but is impossible.
>
> Plan 1:
>
> QUERY PLAN
> -------------------------------------------------
> Gather
> Workers Planned: 2
> -> Parallel Seq Scan on bigt o
> Filter: ((b = 1) AND (c = (SubPlan 1)))
> SubPlan 1
> -> Aggregate
> -> Seq Scan on bigt
> Filter: (c = o.c)
> (8 rows)
>
> because:
>
> (1). During the planning of the SubPlan, we use is_parallel_safe() to
> set the "bigt i"'s consider_parallel to false because of the above
> "PARAM_EXEC" reason.
>
> (2). The parallel_safe of the final SubPlan is set to false due to
> rel->consider_parallel.
>
> (3). During the planning of "bigt o", it calls is_parallel_safe and then
> it find a subplan->parallel_safe == false, then all the partial path is
> impossible.
>
>
> I think it is better to think about what parallel_safe is designed
> for. In Path:
>
> The definition looks to say: (1) the Path/Plan should not be run as a
> 'parallel_aware' plan, but the code looks to say: (2). The Path/Plan
> should not be run in a parallel worker even it is *not*
> parallel_aware.
..
> So parallel_safe looks have two different meaning to me.

I'd like to revist 'bool parallel_safe' to 'ParallelSafety
parallel_safe' for RelOptInfo, Path and Plan (I'd like to rename
RelOptInfo->consider_parallel to parallel_safe for consistentence).

ParallelSafety would contains 3 properties:

1. PARALLEL_UNSAFE = 0 // default. This acts exactly same as the
current paralle_safe = false. When it is set on RelOptInfo, non
partial pathlist on this RelOptInfo should be considered. When it is set
to Path/Plan, no parallel worker should run the Path/Plan.

2. PARALLEL_WORKER_SAFE = 1 // We can set parallel_safe to this value for
the PARAM_EXEC case (when parallel-unsafe function and
Gather/MergeGather doesn't exist), The theory behind it is for a
non-partial-path, it always populate a complete/same result, no matter
different workers use different PARAM_EXEC values. the impact is no
partial path should be considered on this RelOptInfo, but the
non-partial-path/plan could be used with other partial path.

3. PARALLEL_PARTIALPATH_SAFE = 2: same as the parallel_safe=true.

After this design, more Plan with SubPlan could be parallelized. Take
my case for example:

select * from bigt o where b = 1 and c = (select sum(c) from bigt i
where c = o.c);

RelOptInfo of 'bigt i' would have a parallel_safe =
PARALLEL_WORKER_SAFE, so non partial path should be generated. and the
final SubPlan would have a parallel_safe = PARALLEL_WORKER_SAFE.

When planning RelOptInfo of 'bigt o', it only check if the
SubPlan->parallel_safe is PARALLEL_UNSAFE, so at last
RelOptInfo->parallel_safe is PARALLEL_PARTIALPATH_SAFE, then we could
populated partial_pathlist for it. and the desired plan could be
generated.

--
Best Regards
Andy Fan

In response to

parallel_safe at 2025-05-21 06:49:28 from Andy Fan

Responses

parallel safety of correlated subquery (was: parallel_safe) at 2025-07-02 07:02:52 from Andy Fan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2025-05-23 01:19:00	Re: Why our Valgrind reports suck
Previous Message	Michael Paquier	2025-05-22 23:37:20	Re: queryId constant squashing does not support prepared statements