Re: INSERT INTO SELECT, Why Parallelism is not selected?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT INTO SELECT, Why Parallelism is not selected?
Date: 2020-08-18 13:38:55
Message-ID: CAA4eK1Ks8Sqs29VHPS6koNj5E9YQdkGCzgGsSrQMeUbQfe28yg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 18, 2020 at 1:37 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Tue, Jul 14, 2020 at 1:20 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Mon, Jul 13, 2020 at 4:23 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > > I think we can do more than this by
> > > parallelizing the Insert part of this query as well as we have lifted
> > > group locking restrictions related to RelationExtension and Page lock
> > > in PG13. It would be really cool to do that unless we see any
> > > fundamental problems with it.
> >
> > +1, I think it will be cool to support for insert part here as well as
> > insert part in 'Create Table As Select..' as well.
> >
>
> +1 to parallelize inserts. Currently, ExecInsert() and CTAS use
> table_tuple_insert(), if we parallelize these parts, each worker will
> be inserting it's tuples(one tuple at a time) into the same data page,
> until space is available, if not a new data page can be obtained by
> any of the worker, others might start inserting into it. This way,
> will there be lock contention on data pages?
>

It is possible but we need to check how much that is a bottleneck
because that should not be a big part of the operation. And, it won't
be any worse than inserts via multiple backends. I think it is
important to do that way, otherwise, some of the pages can remain
half-empty.

Right now, the plan for Insert ... Select is like
Insert on <tbl_x>
-> Seq Scan on <tbl_y>
....

In the above the scan could be index scan as well. What we want is:
Gather
-> Insert on <tbl_x>
-> Seq Scan on <tbl_y>
....

>. Do we also need to make
> inserts to use table_multi_insert() (like the way "COPY" uses) instead
> of table_tuple_insert()?
>

I am not sure at this stage but if it turns out to be a big problem
then we might think of inventing some way to allow individual workers
to operate on different pages. I think even without that we should be
able to make a big gain as reads, filtering, etc can be done in
parallel.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-08-18 13:54:30 Re: Creating a function for exposing memory usage of backend process
Previous Message Stephen Frost 2020-08-18 13:18:03 Re: use pg_get_functiondef() in pg_dump