Re: Parallel INSERT (INTO ... SELECT ...)

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-18 01:03:11
Message-ID: CAJcOf-dENt5-aphCX0LiKDihBHEpUTV0w5YKCmDpJdxX2YCn4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 18, 2021 at 12:34 AM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>
> > Your revised version seems OK, though I do have a concern:
> > Is the use of "table_close(rel, NoLock)'' intentional? That will keep
> > the lock (lockmode) until end-of-transaction.
>
> I think we always keep any locks on relations that are involved in a
> plan until end-of-transaction. What if a partition is changed in an
> unsafe manner between being considered safe for parallel insertion and
> actually performing the parallel insert?
>
> BTW, I just noticed that exctract_query_dependencies() runs on a
> rewritten, but not-yet-planned query tree, that is, I didn't know that
> extract_query_dependencies() only populates the CachedPlanSource's
> relationOids and not CachedPlan's. The former is only for tracking
> the dependencies of an unplanned Query, so partitions should never be
> added to it. Instead, they should be added to
> PlannedStmt.relationOids (note PlannedStmt belongs to CachedPlan),
> which is kind of what your earlier patch did. Needless to say,
> PlanCacheRelCallback checks both CachedPlanSource.relationOids and
> PlannedStmt.relationOids, so if it receives a message about a
> partition, its OID is matched from the latter.
>
> All that is to say that we should move our code to add partition OIDs
> as plan dependencies to somewhere in set_plan_references(), which
> otherwise populates PlannedStmt.relationOids. I updated the patch to
> do that.

OK, understood. Thanks for the detailed explanation.

> It also occurred to me that we can avoid pointless adding of
> partitions if the final plan won't use parallelism. For that, the
> patch adds checking glob->parallelModeNeeded, which seems to do the
> trick though I don't know if that's the correct way of doing that.
>

I'm not sure if's pointless adding partitions even in the case of NOT
using parallelism, because we may be relying on the result of
parallel-safety checks on partitions in both cases.
For example, insert_parallel.sql currently includes a test (that you
originally provided in a previous post) that checks a non-parallel
plan is generated after a parallel-unsafe trigger is created on a
partition involved in the INSERT.
If I further add to that test by then dropping that trigger and then
again using EXPLAIN to see what plan is generated, then I'd expect a
parallel-plan to be generated, but with the setrefs-v3.patch it still
generates a non-parallel plan. So I think the "&&
glob->parallelModeNeeded" part of test needs to be removed.

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-02-18 01:31:44 Re: PoC/WIP: Extended statistics on expressions
Previous Message Euler Taveira 2021-02-18 00:27:56 Re: [PATCH] pg_hba.conf error messages for logical replication connections