Re: INSERT INTO SELECT, Why Parallelism is not selected?

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT INTO SELECT, Why Parallelism is not selected?
Date: 2020-07-30 06:32:46
Message-ID: CAA4eK1L3Dca1zmPptcYzqZUa8qfARxYkpZrRfas19vpyEaHBFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 29, 2020 at 7:18 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> I still don't agree with this as proposed.
>
> + * For now, we don't allow parallel inserts of any form not even where the
> + * leader can perform the insert. This restriction can be uplifted once
> + * we allow the planner to generate parallel plans for inserts. We can
>
> If I'm understanding this correctly, this logic is completely
> backwards. We don't prohibit inserts here because we know the planner
> can't generate them. We prohibit inserts here because, if the planner
> somehow did generate them, it wouldn't be safe. You're saying that
> it's not allowed because we don't try to do it yet, but actually it's
> not allowed because we want to make sure that we don't accidentally
> try to do it. That's very different.
>

Right, so how about something like: "To allow parallel inserts, we
need to ensure that they are safe to be performed in workers. We have
the infrastructure to allow parallel inserts in general except for the
case where inserts generate a new commandid (eg. inserts into a table
having a foreign key column)." We can extend this for tuple locking
if required as per the below discussion. Kindly suggest if you prefer
a different wording here.

>
> + * We should be able to parallelize
> + * the later case if we can ensure that no two parallel processes can ever
> + * operate on the same page.
>
> I don't know whether this is talking about two processes operating on
> the same page at the same time, or ever within a single query
> execution. If it's the former, perhaps we need to explain why that's a
> concern for parallel query but not otherwise;
>

I am talking about the former case and I know that as per current
design it is not possible that two worker processes try to operate on
the same page but I was trying to be pessimistic so that we can ensure
that via some form of Assert. I don't know whether it is important to
mention this case or not but for the sake of extra safety, I have
mentioned it.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-07-30 06:53:48 Re: Doc patch: mention indexes in pg_inherits docs
Previous Message Kasahara Tatsuhito 2020-07-30 06:13:51 Re: Creating a function for exposing memory usage of backend process