Re: Parallel Inserts in CREATE TABLE AS

From: Luc Vlaming <luc(at)swarm64(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2020-11-27 07:37:56
Message-ID: 637d88b6-f5a4-7d46-8b3b-00a2d61d0e77@swarm64.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25-11-2020 03:40, Bharath Rupireddy wrote:
> On Tue, Nov 24, 2020 at 4:43 PM Hou, Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
>>
>> I'm very interested in this feature,
>> and I'm looking at the patch, here are some comments.
>>
>
> Thanks for the review.
>
>>
>> How about the following style:
>>
>> if(TupIsNull(outerTupleSlot))
>> Break;
>>
>> (void) node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest);
>> node->ps.state->es_processed++;
>>
>> Which looks cleaner.
>>
>
> Done.
>
>>
>> The check can be replaced by ISCTAS(into).
>>
>
> Done.
>
>>
>> 'inerst' looks like a typo (insert).
>>
>
> Corrected.
>
>>
>> The code here call strlen(intoclausestr) for two times,
>> After checking the existing code in ExecInitParallelPlan,
>> It used to store the strlen in a variable.
>>
>> So how about the following style:
>>
>> intoclause_len = strlen(intoclausestr);
>> ...
>> /* Store serialized intoclause. */
>> intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len + 1);
>> memcpy(shmptr, intoclausestr, intoclause_len + 1);
>> shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space);
>>
>
> Done.
>
>>
>> The two check about intoclausestr seems can be combined like:
>>
>> if (intoclausestr != NULL)
>> {
>> ...
>> }
>> else
>> {
>> ...
>> }
>>
>
> Done.
>
> Attaching v5 patch. Please consider it for further review.
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>

Disclaimer: I have by no means throughly reviewed all the involved parts
and am probably missing quite a bit of context so if I understood parts
wrong or they have been discussed before then I'm sorry. Most notably
the whole situation about the command-id is still elusive for me and I
can really not judge yet anything related to that.

IMHO The patch makes that we now have the gather do most of the CTAS
work, which seems unwanted. For the non-ctas insert/update case it seems
that a modifytable node exists to actually do the work. What I'm
wondering is if it is maybe not better to introduce a CreateTable node
as well?
This would have several merits:
- the rowcount of that node would be 0 for the parallel case, and
non-zero for the serial case. Then the gather ndoe and the Query struct
don't have to know about CTAS for the most part, removing e.g. the case
distinctions in cost_gather.
- the inserted rows can now be accounted in this new node instead of the
parallel executor state, and this node can also do its own DSM
intializations
- the generation of a partial variants of the CreateTable node can now
be done in the optimizer instead of the ExecCreateTableAs which IMHO is
a more logical place to make these kind of decisions. which then also
makes it potentially play nicer with costs and the like.
- the explain code can now be in its own place instead of part of the
gather node
- IIUC it would allow the removal of the code to only launch parallel
workers if its not CTAS, which IMHO would be quite a big benefit.

Thoughts?

Some small things I noticed while going through the patch:
- Typo for the comment about "inintorel_startup" which should be
intorel_startup
- if (node->nworkers_launched == 0 && !node->need_to_scan_locally)

can be changed into
if (node->nworkers_launched == 0
because either way it'll be true.

Regards,
Luc
Swarm64

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-11-27 07:46:04 Re: Multi Inserts in CREATE TABLE AS - revived patch
Previous Message Alexander Korotkov 2020-11-27 07:27:08 Re: Improving spin-lock implementation on ARM.