Re: Parallel Inserts in CREATE TABLE AS

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2020-12-26 15:50:45
Message-ID: CALDaNm1y7rYZi9+-M738Ch=BHhM+C=V4saeXyxctO+NLdWdALQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 24, 2020 at 1:07 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Thu, Dec 24, 2020 at 10:25 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > You could change intoclause_len = strlen(intoclausestr) to
> > strlen(intoclausestr) + 1 and use intoclause_len in the remaining
> > places. We can avoid the +1 in the other places.
> > + /* Estimate space for into clause for CTAS. */
> > + if (IS_CTAS(intoclause) && OidIsValid(objectid))
> > + {
> > + intoclausestr = nodeToString(intoclause);
> > + intoclause_len = strlen(intoclausestr);
> > + shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
> > + shm_toc_estimate_keys(&pcxt->estimator, 1);
> > + }
>
> Done.
>
> > Can we use node->nworkers_launched == 0 in place of
> > node->need_to_scan_locally, that way the setting and resetting of
> > node->need_to_scan_locally can be removed. Unless need_to_scan_locally
> > is needed in any of the functions that gets called.
> > + /* Enable leader to insert in case no parallel workers were launched. */
> > + if (node->nworkers_launched == 0)
> > + node->need_to_scan_locally = true;
> > +
> > + /*
> > + * By now, for parallel workers (if launched any), would have
> > started their
> > + * work i.e. insertion to target table. In case the leader is chosen to
> > + * participate for parallel inserts in CTAS, then finish its
> > share before
> > + * going to wait for the parallel workers to finish.
> > + */
> > + if (node->need_to_scan_locally)
> > + {
>
> need_to_scan_locally is being set in ExecGather() even if
> nworkers_launched > 0 it can still be true, so I think we can not
> remove need_to_scan_locally in ExecParallelInsertInCTAS.
>
> Attaching v15 patch set for further review. Note that the change is
> only in 0001 patch, other patches remain unchanged from v14.
>

+-- parallel inserts must occur
+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;

We can change comment "parallel inserts must occur" like "parallel
insert must be selected for CTAS on normal table"

+-- parallel inserts must occur
+select explain_pictas(
+'create unlogged table parallel_write as select length(stringu1) from tenk1;');
+select count(*) from parallel_write;
+drop table parallel_write;

We can change comment "parallel inserts must occur" like "parallel
insert must be selected for CTAS on unlogged table"
Similar comment need to be handled in other places also.

+create function explain_pictas(text) returns setof text
+language plpgsql as
+$$
+declare
+ ln text;
+begin
+ for ln in
+ execute format('explain (analyze, costs off, summary off,
timing off) %s',
+ $1)
+ loop
+ ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers
Launched: N');
+ ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual
rows=N loops=N');
+ ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows
Removed by Filter: N');
+ return next ln;
+ end loop;
+end;
+$$;

The above function is same as function present in partition_prune.sql:
create function explain_parallel_append(text) returns setof text
language plpgsql as
$$
declare
ln text;
begin
for ln in
execute format('explain (analyze, costs off, summary off,
timing off) %s',
$1)
loop
ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers
Launched: N');
ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual
rows=N loops=N');
ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows
Removed by Filter: N');
return next ln;
end loop;
end;
$$;

If possible try to make a common function for both and use.

+ if (intoclausestr && OidIsValid(objectid))
+ fpes->objectid = objectid;
+ else
+ fpes->objectid = InvalidOid;
Here OidIsValid(objectid) check is not required intoclausestr will be
set only if OidIsValid.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-12-26 16:45:41 Re: pgsql: Add pg_alterckey utility to change the cluster key
Previous Message vignesh C 2020-12-26 15:48:13 Re: Parallel copy