Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2020-09-25 12:17:42
Message-ID: CAA4eK1+9xD2L4oh0UMQT=V+bZ2ULnPK8=HU+k7rwYMSWBQbbPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 25, 2020 at 10:02 AM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> Hi Andres,
>
> On Thu, Sep 24, 2020 at 12:21 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
>
>
> >> @@ -116,7 +117,7 @@ toast_save_datum(Relation rel, Datum value,
> >> TupleDesc toasttupDesc;
> >> Datum t_values[3];
> >> bool t_isnull[3];
> >> - CommandId mycid = GetCurrentCommandId(true);
> >> + CommandId mycid = GetCurrentCommandId(!IsParallelWorker());
> >> struct varlena *result;
> >> struct varatt_external toast_pointer;
> >> union
> >
> >Hm? Why do we need this in the various places you have made this change?
>
> It's because for Parallel INSERT, we're assigning the same command-id
> to each worker up-front during worker initialization (the commandId
> has been retrieved by the leader and passed through to each worker)
> and "currentCommandIdUsed" has been set true. See the
> AssignCommandIdForWorker() function in the patch.
> If you see the code of GetCurrentCommandId(), you'll see it Assert
> that it's not being run by a parallel worker if the parameter is true.
> I didn't want to remove yet another check, without being able to know
> the context of the caller, because only for Parallel INSERT do I know
> that "currentCommandIdUsed was already true at the start of the
> parallel operation". See the comment in that function. Anyway, that's
> why I'm passing "false" to relevant GetCurrentCommandId() calls if
> they're being run by a parallel (INSERT) worker.
>

But we can tighten the condition in GetCurrentCommandId() such that it
Asserts for parallel worker only when currentCommandIdUsed is not set
before start of parallel operation. I also find these changes in the
callers of GetCurrentCommandId() quite adhoc and ugly even if they are
correct. Also, why we don't face a similar problems for parallel copy?

>
> >> diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
> >> index a4944fa..9d3f100 100644
> >> --- a/src/backend/access/transam/varsup.c
> >> +++ b/src/backend/access/transam/varsup.c
> >> @@ -53,13 +53,6 @@ GetNewTransactionId(bool isSubXact)
> >> TransactionId xid;
> >>
> >> /*
> >> - * Workers synchronize transaction state at the beginning of each parallel
> >> - * operation, so we can't account for new XIDs after that point.
> >> - */
> >> - if (IsInParallelMode())
> >> - elog(ERROR, "cannot assign TransactionIds during a parallel operation");
> >> -
> >> - /*
> >> * During bootstrap initialization, we return the special bootstrap
> >> * transaction id.
> >> */
> >
> >Same thing, this code cannot just be allowed to be reachable. What
> >prevents you from assigning two different xids from different workers
> >etc?
>
> At least in the case of Parallel INSERT, the leader for the Parallel
> INSERT gets a new xid (GetCurrentFullTransactionId) and it is passed
> through and assigned to each of the workers during their
> initialization (so they are assigned the same xid).
>

So are you facing problems in this area because we EnterParallelMode
before even assigning the xid in the leader? Because I don't think we
should ever reach this code in the worker. If so, there are two
possibilities that come to my mind (a) assign xid in leader before
entering parallel mode or (b) change the check so that we don't assign
the new xid in workers. In this case, I am again wondering how does
parallel copy dealing this?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-09-25 12:19:50 Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
Previous Message legrand legrand 2020-09-25 11:57:37 Re: enable pg_stat_statements to track rows processed by REFRESH MATERIALIZED VIEW