Re: Determine parallel-safety of partition relations for Inserts

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Determine parallel-safety of partition relations for Inserts
Date: 2021-01-15 12:23:03
Message-ID: CAExHW5s_xBB4cyN12=1BcvoT5Of-kUWtcDGwZ3UNv1f356trVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 15, 2021 at 3:48 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> While reviewing parallel insert [1] (Insert into .... Select) and
> parallel copy patches [2], it came to my notice that both the patches
> traverse the entire partition hierarchy to determine parallel-safety
> of partitioned relations. This is required because before considering
> the Insert or Copy can be considered for parallelism, we need to
> determine whether it is safe to do so. We need to check for each
> partition because any of the partitions can have some parallel-unsafe
> index expression, constraint, etc. We do a similar thing for Selects
> in standard_planner.
>
> The plain Select case for partitioned tables was simpler because we
> anyway loop through all the partitions in set_append_rel_size() and we
> determine parallel-safety of each partition at that time but the same
> is not true for Inserts.
>
> For Inserts, currently, we only open the partition table when we are
> about to insert into that partition. During ExecInsert, we find out
> the partition matching the partition-key value and then lock if it is
> not already locked. In this patch, we need to open each partition at
> the planning time to determine its parallel-safety.

We don't want to open the partitions where no rows will be inserted.

>
> This will surely increase planning time but the execution is reduced
> to an extent due to parallelism that it won't matter for either of the
> cases if we see just total time. For example, see the latest results
> for parallel inserts posted by Haiying Tang [3]. There might be an
> impact when Selects can't be parallelized due to the small size of the
> Select-table but we still have to traverse all the partitions to
> determine parallel-safety but not sure how much it is compared to
> overall time. I guess we need to find the same but apart from that can
> anyone think of a better way to determine parallel-safety of
> partitioned relation for Inserts?

In case of SELECT we open only those partitions which surive pruning.
So those are the ones which will definitely required to be scanned. We
perform parallelism checks only on those partitions. The actual check
isn't much costly.

>
> Thoughts?
>
> Note: I have kept a few people in Cc who are either directly involved
> in this work or work regularly in the partitioning related work just
> in the hope that might help in moving the discussion forward.

Since you brought up comparison between SELECT and INSERT, "pruning"
partitions based on the values being INSERTed might help. It should be
doable in case of INSERT ... SELECT where we need to prune partitions
based on the clauses of SELECT. Doable with some little effort in case
of VALUEs and COPY.

Second possibility is to open partitions only when the estimated
number of rows to be inserted goes beyond a certain value.

Third idea is to use something similar to parallel append where
individual partitions are scanned sequentially but multiple partitions
are scanned in parallel. When a row is inserted into a non-yet-opened
partition, allocate one/more backends to insert into partitions which
do not allow parallelism, otherwise continue to use a common pool of
parallel workers for insertion. This means the same thread performing
select may not perform insert. So some complications will be involved.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-01-15 12:53:33 Re: Commitfest 2021-01 Now in Progress
Previous Message Kyotaro Horiguchi 2021-01-15 12:22:58 Re: Wrong HINT during database recovery when occur a minimal wal.