Re: Parallel INSERT (INTO ... SELECT ...)

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2020-10-09 09:06:33
Message-ID: CAJcOf-f2N3+hf7=L0qxnM9hhJK=MQ7ZoZ24NoZfgOU+SBZXuOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 9, 2020 at 6:31 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> A couple more observations:
>
> + pathnode->path.parallel_aware = parallel_workers > 0 ? true : false;
>
> Hmm, I think this may be bogus window dressing only affecting EXPLAIN.
> If you change it to assign false always, it works just the same,
> except EXPLAIN says:
>
> Gather (cost=15428.00..16101.14 rows=1000000 width=4)
> Workers Planned: 2
> -> Insert on s (cost=15428.00..16101.14 rows=208334 width=4)
> -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=4)
>
> ... instead of:
>
> Gather (cost=15428.00..16101.14 rows=1000000 width=4)
> Workers Planned: 2
> -> Parallel Insert on s (cost=15428.00..16101.14 rows=208334 width=4)
> -> Parallel Hash Join (cost=15428.00..32202.28 rows=416667 width=4)
>
> AFAICS it's not parallel-aware, it just happens to be running in
> parallel with a partial input and partial output (and in this case,
> effect in terms of writes). Parallel-aware is our term for nodes that
> actually know they are running in parallel and do some special
> coordination with their twins in other processes.
>

Ah, thanks, I see the distinction now. I'll fix that, to restore
parallel_aware=false for the ModifyTable node.

> The estimated row count also looks wrong; at a guess, the parallel
> divisor is applied twice. Let me try that with
> parallel_leader_particiation=off (which disables some funky maths in
> the row estimation and makes it straight division by number of
> processes):
>
> Gather (cost=17629.00..18645.50 rows=1000000 width=4)
> Workers Planned: 2
> -> Insert on s (cost=17629.00..18645.50 rows=250000 width=4)
> -> Parallel Hash Join (cost=17629.00..37291.00 rows=500000 width=4)
> [more nodes omitted]
>
> Yeah, that was a join that spat out a million rows, and we correctly
> estimated 500k per process, and then Insert (still with my hack to
> turn off the bogus "Parallel" display in this case, but it doesn't
> affect the estimation) estimated 250k per process, which is wrong.

Thanks, I did suspect the current costing was wrong for ModifyTable
(workers>0 case), as I'd thrown it in (moving current costing code
into costsize.c) without a lot of checking or great thought, and was
on my TODO list of things to check. At least I created a placeholder
for it. Looks like I've applied a parallel-divisor again (not allowing
for that of the underlying query), as you said.
Speaking of costing, I'm not sure I really agree with the current
costing of a Gather node. Just considering a simple Parallel SeqScan
case, the "run_cost += parallel_tuple_cost * path->path.rows;" part of
Gather cost always completely drowns out any other path costs when a
large number of rows are involved (at least with default
parallel-related GUC values), such that Parallel SeqScan would never
be the cheapest path. This linear relationship in the costing based on
the rows and a parallel_tuple_cost doesn't make sense to me. Surely
after a certain amount of rows, the overhead of launching workers will
be out-weighed by the benefit of their parallel work, such that the
more rows, the more likely a Parallel SeqScan will benefit. That seems
to suggest something like a logarithmic formula (or similar) would
better match reality than what we have now. Am I wrong on this? Every
time I use default GUC values, the planner doesn't want to generate a
parallel plan. Lowering parallel-related GUCs like parallel_tuple_cost
(which I normally do for testing) influences it of course, but the
linear relationship still seems wrong.

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-10-09 09:09:29 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message Amit Kapila 2020-10-09 09:04:54 Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables