Determine parallel-safety of partition relations for Inserts

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Determine parallel-safety of partition relations for Inserts
Date: 2021-01-15 10:20:46
Message-ID: CAA4eK1K-cW7svLC2D7DHoGHxdAdg3P37BLgebqBOC2ZLc9a6QQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While reviewing parallel insert [1] (Insert into .... Select) and
parallel copy patches [2], it came to my notice that both the patches
traverse the entire partition hierarchy to determine parallel-safety
of partitioned relations. This is required because before considering
the Insert or Copy can be considered for parallelism, we need to
determine whether it is safe to do so. We need to check for each
partition because any of the partitions can have some parallel-unsafe
index expression, constraint, etc. We do a similar thing for Selects
in standard_planner.

The plain Select case for partitioned tables was simpler because we
anyway loop through all the partitions in set_append_rel_size() and we
determine parallel-safety of each partition at that time but the same
is not true for Inserts.

For Inserts, currently, we only open the partition table when we are
about to insert into that partition. During ExecInsert, we find out
the partition matching the partition-key value and then lock if it is
not already locked. In this patch, we need to open each partition at
the planning time to determine its parallel-safety.

This will surely increase planning time but the execution is reduced
to an extent due to parallelism that it won't matter for either of the
cases if we see just total time. For example, see the latest results
for parallel inserts posted by Haiying Tang [3]. There might be an
impact when Selects can't be parallelized due to the small size of the
Select-table but we still have to traverse all the partitions to
determine parallel-safety but not sure how much it is compared to
overall time. I guess we need to find the same but apart from that can
anyone think of a better way to determine parallel-safety of
partitioned relation for Inserts?

Thoughts?

Note: I have kept a few people in Cc who are either directly involved
in this work or work regularly in the partitioning related work just
in the hope that might help in moving the discussion forward.

[1] - https://www.postgresql.org/message-id/CAJcOf-dF9ohqub_D805k57Y_AuDLeAQfvtaax9SpwjTSEVdiXg%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CALDaNm32c7kWpZm9UkS5P%2BShsfRhyMTWKvFHyn9O53WZWvO2iA%40mail.gmail.com
[3] - https://www.postgresql.org/message-id/b54f2e306780449093c311118cd8a04e%40G08CNEXMBPEKD05.g08.fujitsu.local

--
With Regards,
Amit Kapila.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2021-01-15 10:32:56 Re: Online checksums patch - once again
Previous Message vignesh C 2021-01-15 09:46:48 Re: Added schema level support for publication.