Re: Parallel INSERT (INTO ... SELECT ...)

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2020-10-12 01:20:35
Message-ID: CAJcOf-fUv5v0tFHSHrrra4x+AE_qS3CqcE0JkUxdWL=U6XboyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 10, 2020 at 3:32 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > OK, for the minimal patch, just allowing INSERT with parallel SELECT,
> > you're right, neither of those additional "commandType == CMD_SELECT"
> > checks are needed, so I'll remove them.
> >
>various
> Okay, that makes sense.
>

For the minimal patch (just allowing INSERT with parallel SELECT),
there are issues with parallel-mode and various parallel-mode-related
checks in the code.
Initially, I thought it was only a couple of XID-related checks (which
could perhaps just be tweaked to check for IsParallelWorker() instead,
as you suggested), but I now realise that there are a lot more cases.
This stems from the fact that just having a parallel SELECT (as part
of non-parallel INSERT) causes parallel-mode to be set for the WHOLE
plan. I'm not sure why parallel-mode is set globally like this, for
the whole plan. Couldn't it just be set for the scope of
Gather/GatherMerge? Otherwise, errors from these checks seem to be
misleading when outside the scope of Gather/GatherMerge, as
technically they are not occurring within the scope of parallel-leader
and parallel-worker(s). The global parallel-mode wouldn't have been an
issue before, because up to now INSERT has never had underlying
parallel operations.

For example, when running the tests under
"force_parallel_mode=regress", the test failures show that there are a
lot more cases affected:

"cannot assign TransactionIds during a parallel operation"
"cannot assign XIDs during a parallel operation"
"cannot start commands during a parallel operation"
"cannot modify commandid in active snapshot during a parallel operation"
"cannot execute nextval() during a parallel operation"
"cannot execute INSERT during a parallel operation"
"cannot execute ANALYZE during a parallel operation
"cannot update tuples during a parallel operation"

(and there are more not currently detected by the tests, found by
searching the code).

As an example, with the minimal patch applied, if you had a trigger on
INSERT that, say, attempted a table creation or UPDATE/DELETE, and you
ran an "INSERT INTO ... SELECT...", it would treat the trigger
operations as being attempted in parallel-mode, and so an error would
result.

Let me know your thoughts on how to deal with these issues.
Can you see a problem with only having parallel-mode set for scope of
Gather/GatherMerge, or do you have some other idea?

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2020-10-12 01:34:40 Re: Possible NULL dereferencing null pointer (src/backend/executor/nodeIncrementalSort.c)
Previous Message Michael Paquier 2020-10-12 01:16:46 Re: powerpc pg_atomic_compare_exchange_u32_impl: error: comparison of integer expressions of different signedness (Re: pgsql: For all ppc compilers, implement compare_exchange and) fetch_add