Re: Parallel Inserts in CREATE TABLE AS

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2020-10-19 17:17:01
Message-ID: CALj2ACU8CHmE20dJhnr125s1oCugtgUwJSUbsLkJi3=ymnMKyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 15, 2020 at 3:18 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > > > 1. How to represent the parallel insert for CTAS in explain plans?
The
> > > > explain CTAS shows the plan for only the SELECT part. How about
having
> > > > some textual info along with the Gather node? I'm not quite sure on
> > > > this point, any suggestions are welcome.
> > >
> > > I am also not sure about this point because we don't display anything
> > > for the DDL part in explain. Can you propose by showing some example
> > > of what you have in mind?
> >
> > I thought we could have something like this.
> >
-----------------------------------------------------------------------------
> > Gather (cost=1000.00..108738.90 rows=0 width=8)
> > Workers Planned: 2 Parallel Insert on t_test1
> > -> Parallel Seq Scan on t_test (cost=0.00..106748.00
rows=4954 width=8)
> > Filter: (many < 10000)
> >
-----------------------------------------------------------------------------
>
> maybe something like below:
> Gather (cost=1000.00..108738.90 rows=0 width=8)
> -> Create t_test1
> -> Parallel Seq Scan on t_test
>
> I don't know what is the best thing to do here. I think for the
> temporary purpose you can keep something like above then once the
> patch is matured then we can take a separate opinion for this.
>

Agreed. Here's a snapshot of explain with the change suggested.

postgres=# EXPLAIN (ANALYZE, COSTS OFF) CREATE TABLE t1_test AS SELECT *
FROM t1;
QUERY PLAN
---------------------------------------------------------------------------------
Gather (actual time=970.524..972.913 rows=0 loops=1)
* -> Create t1_test*
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on t1 (actual time=0.028..86.623 rows=333333
loops=3)
Planning Time: 0.049 ms
Execution Time: 973.733 ms

>
> I think there is no reason why one can't use ORDER BY in the
> statements we are talking about here. But, I think we can't enable
> parallelism for GatherMerge is because for that node we always need to
> fetch the data in the leader backend to perform the final merge phase.
> So, I was expecting a small comment saying something on those lines.
>

Added comments.

>
> 2. Addition of new test cases.
>

Added new test cases.

>
> Analysis on the 2 mismatches in write_parallel.sql regression test.
>

Done. It needed a small code change in costsize.c. Now, both make check and
make check-world passes.

Apart from above, a couple of other things I have finished with the v3
patch.

1. Both make check and make check-world with force_parallel_mode = regress
passes.
2. I enabled parallel inserts in case of materialized views. Hope that's
fine.

Attaching v3 patch herewith.

I'm done with all the open points in my list. Please review the v3 patch
and provide comments.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v3-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch application/x-patch 44.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-10-19 18:37:02 Re: Probable documentation errors or improvements
Previous Message Justin Pryzby 2020-10-19 16:45:29 Re: Probable documentation errors or improvements