Re: Parallel INSERT (INTO ... SELECT ...)

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2020-09-28 03:14:27
Message-ID: CAJcOf-fn1nhEtaU91NvRuA3EbvbJGACMd4_c+Uu3XU5VMv37Aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 26, 2020 at 3:30 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> I further checked on full txn id and command id. Yes, these are
> getting passed to workers via InitializeParallelDSM() ->
> SerializeTransactionState(). I tried to summarize what we need to do
> in case of parallel inserts in general i.e. parallel COPY, parallel
> inserts in INSERT INTO and parallel inserts in CTAS.
>
> In the leader:
> GetCurrentFullTransactionId()
> GetCurrentCommandId(true)
> EnterParallelMode();
> InitializeParallelDSM() --> calls SerializeTransactionState()
> (both full txn id and command id are serialized into parallel DSM)
>
> In the workers:
> ParallelWorkerMain() --> calls StartParallelWorkerTransaction() (both
> full txn id and command id are restored into workers'
> CurrentTransactionState->fullTransactionId and currentCommandId)
> If the parallel workers are meant for insertions, then we need to set
> currentCommandIdUsed = true; Maybe we can lift the assert in
> GetCurrentCommandId(), if we don't want to touch that function, then
> we can have a new function GetCurrentCommandidInWorker() whose
> functionality will be same as GetCurrentCommandId() without the
> Assert(!IsParallelWorker());.
>
> Am I missing something?
>
> If the above points are true, we might have to update the parallel
> copy patch set, test the use cases and post separately in the parallel
> copy thread in coming days.
>

Hi Bharath,

I pretty much agree with your above points.

I've attached an updated Parallel INSERT...SELECT patch, that:
- Only uses existing transaction state serialization support for
transfer of xid and cid.
- Adds a "SetCurrentCommandIdUsedForWorker" function, for setting
currentCommandIdUsed=true at the start of a parallel operation (used
for Parallel INSERT case, where we know the currentCommandId has been
assigned to the worker at the start of the parallel operation).
- Tweaks the Assert condition within "used=true" parameter case in
GetCurrentCommandId(), so that it only fires if in a parallel worker
and currentCommandId is false - refer to the updated comment in that
function.
- Does not modify any existing GetCurrentCommandId() calls.
- Does not remove any existing parallel-related asserts/checks, except
for the "cannot insert tuples in a parallel worker" error in
heap_prepare_insert(). I am still considering what to do with the
original error-check here.
[- Does not yet cater for other exclusion cases that you and Vignesh
have pointed out]

This patch is mostly a lot cleaner, but does contain a possible ugly
hack, in that where it needs to call GetCurrentFullTransactionId(), it
must temporarily escape parallel-mode (recalling that parallel-mode is
set true right at the top of ExectePlan() in the cases of Parallel
INSERT/SELECT).

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment Content-Type Size
0003-ParallelInsertSelect.patch application/octet-stream 29.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Abhijit Menon-Sen 2020-09-28 03:39:24 [PATCH] SET search_path += octopus
Previous Message Kyotaro Horiguchi 2020-09-28 02:54:16 Re: New statistics for tuning WAL buffer size