Re: INSERT INTO SELECT, Why Parallelism is not selected?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT INTO SELECT, Why Parallelism is not selected?
Date: 2020-07-30 13:12:44
Message-ID: CAA4eK1JXVAqec8ocWoWYiT7trh=YXe+87Afd=nr5j9AnWnNefw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 30, 2020 at 12:02 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Jul 29, 2020 at 7:18 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > I still don't agree with this as proposed.
> >
> > + * For now, we don't allow parallel inserts of any form not even where the
> > + * leader can perform the insert. This restriction can be uplifted once
> > + * we allow the planner to generate parallel plans for inserts. We can
> >
> > If I'm understanding this correctly, this logic is completely
> > backwards. We don't prohibit inserts here because we know the planner
> > can't generate them. We prohibit inserts here because, if the planner
> > somehow did generate them, it wouldn't be safe. You're saying that
> > it's not allowed because we don't try to do it yet, but actually it's
> > not allowed because we want to make sure that we don't accidentally
> > try to do it. That's very different.
> >
>
> Right, so how about something like: "To allow parallel inserts, we
> need to ensure that they are safe to be performed in workers. We have
> the infrastructure to allow parallel inserts in general except for the
> case where inserts generate a new commandid (eg. inserts into a table
> having a foreign key column)." We can extend this for tuple locking
> if required as per the below discussion. Kindly suggest if you prefer
> a different wording here.
>
> >
> > + * We should be able to parallelize
> > + * the later case if we can ensure that no two parallel processes can ever
> > + * operate on the same page.
> >
> > I don't know whether this is talking about two processes operating on
> > the same page at the same time, or ever within a single query
> > execution. If it's the former, perhaps we need to explain why that's a
> > concern for parallel query but not otherwise;
> >
>
> I am talking about the former case and I know that as per current
> design it is not possible that two worker processes try to operate on
> the same page but I was trying to be pessimistic so that we can ensure
> that via some form of Assert.
>

I think the two worker processes can operate on the same page for a
parallel index scan case but it won't be for same tuple. I am not able
to think of any case where we should be worried about tuple locking
for Insert's case, so we can probably skip writing anything about it
unless someone else can think of such a case.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anastasia Lubennikova 2020-07-30 13:40:46 Re: [BUG] Error in BRIN summarization
Previous Message Robert Haas 2020-07-30 12:11:19 Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks