Re: Parallel INSERT (INTO ... SELECT ...)

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2020-10-08 21:41:04
Message-ID: CA+hUKGLpbvhEo_BMZ8gZkUfwJtMUcKPUjnzLmXY_eSLbf3=jLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 6, 2020 at 10:38 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
+ if (estate->es_plannedstmt->commandType == CMD_INSERT)
...
+ if ((XactReadOnly || (IsInParallelMode() &&
queryDesc->plannedstmt->commandType != CMD_INSERT)) &&
...
+ isParallelInsertLeader = nodeModifyTableState->operation == CMD_INSERT;
...

One thing I noticed is that you have logic, variable names and
assertions all over the tree that assume that we can only do parallel
*inserts*. I agree 100% with your plan to make Parallel Insert work
first, it is an excellent goal and if we get it in it'll be a headline
feature of PG14 (along with COPY etc). That said, I wonder if it
would make sense to use more general naming (isParallelModifyLeader?),
be more liberal where you really mean "is it DML", and find a way to
centralise the logic about which DML commands types are currently
allowed (ie insert only for now) for assertions and error checks etc,
so that in future we don't have to go around and change all these
places and rename things again and again.

While contemplating that, I couldn't resist taking a swing at the main
(?) show stopper for Parallel Update and Parallel Delete, judging by
various clues left in code comments by Robert: combo command IDs
created by other processes. Here's a rapid prototype to make that
work (though perhaps not as efficiently as we'd want, not sure). With
that in place, I wonder what else we'd need to extend your patch to
cover all three operations... it can't be much! Of course I don't
want to derail your work on Parallel Insert, I'm just providing some
motivation for my comments on the (IMHO) shortsightedness of some of
the coding.

PS Why not use git format-patch to create patches?

Attachment Content-Type Size
0001-Coordinate-combo-command-IDs-with-parallel-workers.patch text/x-patch 23.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-10-08 21:54:08 Re: [HACKERS] Custom compression methods
Previous Message Ranier Vilela 2020-10-08 21:27:39 Re: [PATCH] ecpg: fix progname memory leak