Re: INSERT INTO SELECT, Why Parallelism is not selected?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT INTO SELECT, Why Parallelism is not selected?
Date: 2020-09-09 04:50:40
Message-ID: CAA4eK1K9RgqTDWntnRSBdnpEw2JbD3f6N=Dye_53f=N1sbYbiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 30, 2020 at 6:42 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Jul 30, 2020 at 12:02 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Jul 29, 2020 at 7:18 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > >
> > > I still don't agree with this as proposed.
> > >
> > > + * For now, we don't allow parallel inserts of any form not even where the
> > > + * leader can perform the insert. This restriction can be uplifted once
> > > + * we allow the planner to generate parallel plans for inserts. We can
> > >
> > > If I'm understanding this correctly, this logic is completely
> > > backwards. We don't prohibit inserts here because we know the planner
> > > can't generate them. We prohibit inserts here because, if the planner
> > > somehow did generate them, it wouldn't be safe. You're saying that
> > > it's not allowed because we don't try to do it yet, but actually it's
> > > not allowed because we want to make sure that we don't accidentally
> > > try to do it. That's very different.
> > >
> >
> > Right, so how about something like: "To allow parallel inserts, we
> > need to ensure that they are safe to be performed in workers. We have
> > the infrastructure to allow parallel inserts in general except for the
> > case where inserts generate a new commandid (eg. inserts into a table
> > having a foreign key column)."

Robert, Dilip, do you see any problem if we change the comment on the
above lines? Feel free to suggest if you have something better in
mind.

> > We can extend this for tuple locking
> > if required as per the below discussion. Kindly suggest if you prefer
> > a different wording here.
> >

I feel we can leave this based on the reasoning provided below.

> > >
> > > + * We should be able to parallelize
> > > + * the later case if we can ensure that no two parallel processes can ever
> > > + * operate on the same page.
> > >
> > > I don't know whether this is talking about two processes operating on
> > > the same page at the same time, or ever within a single query
> > > execution. If it's the former, perhaps we need to explain why that's a
> > > concern for parallel query but not otherwise;
> > >
> >
> > I am talking about the former case and I know that as per current
> > design it is not possible that two worker processes try to operate on
> > the same page but I was trying to be pessimistic so that we can ensure
> > that via some form of Assert.
> >
>
> I think the two worker processes can operate on the same page for a
> parallel index scan case but it won't be for same tuple. I am not able
> to think of any case where we should be worried about tuple locking
> for Insert's case, so we can probably skip writing anything about it
> unless someone else can think of such a case.
>

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiro Ikeda 2020-09-09 04:57:37 Re: New statistics for tuning WAL buffer size
Previous Message Ashutosh Bapat 2020-09-09 04:36:17 Re: Ideas about a better API for postgres_fdw remote estimates