Re: Parallel INSERT (INTO ... SELECT ...)

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: "houzj(dot)fnst(at)cn(dot)fujitsu(dot)com" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-12 12:47:57
Message-ID: CAJcOf-ebs56+CcNrE3yHhPMOSB-TcrKU=1XWi6dBBCt=3XJA1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 12, 2021 at 5:30 PM tsunakawa(dot)takay(at)fujitsu(dot)com
<tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:
>
> > If I disable parallel_leader_participation.
> >
> > For max_parallel_workers_per_gather = 4, It still have performance
> > degradation.
> >
> > For max_parallel_workers_per_gather = 2, the performance degradation will
> > not happen in most of the case.
> > There is sometimes a noise(performance degradation), but most of
> > result(about 80%) is good.
>
> Thank you. The results indicate that it depends on the degree of parallelism whether the gain from parallelism outweighs the loss of parallel insert operations, at least in the bitmap scan case.
>

That seems to be the pattern for this particular query, but I think
we'd need to test a variety to determine if that's always the case.

> But can we conclude that this is limited to bitmap scan? Even if that's the case, the planner does not have information about insert operation to choose other plans like serial execution or parallel sequential scan. Should we encourage the user in the manual to tune parameters and find the fastest plan?
>
>

It's all based on path costs, so we need to analyze and compare the
costing calculations done in this particular case against other cases,
and the values of the various parameters (costsize.c).
It's not difficult to determine for a parallel ModifyTablePath if it
has a BitmapHeapPath subpath - perhaps total_cost needs adjustment
(increase) for this case - and that will influence the planner to
choose a cheaper path. I was able to easily test the effect of doing
this, in the debugger - by increasing total_cost in cost_modifytable()
for the parallel bitmap heap scan case, the planner then chose a
serial Insert + bitmap heap scan, because it then had a cheaper cost.
Of course we need to better understand the problem and observed
patters in order to get a better feel of how those costs should be
adjusted.

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-02-12 12:51:53 Re: logical replication seems broken
Previous Message Bharath Rupireddy 2021-02-12 12:40:28 Re: Improvements and additions to COPY progress reporting