Re: adding partitioned tables to publications

From: 赵锐 <875941708(at)qq(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: adding partitioned tables to publications
Date: 2020-12-30 14:15:24
Message-ID: tencent_41FEA657C206F19AB4F406BE9252A0F69C06@qq.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The first file of Amit's patch can not only re-range the code, but also fix a hidden bug.
To make it easy to see, I attach another patch.
"RelationIdGetRelation" will increase ref on owner-&gt;relrefarr, without "RelationClose", the owner-&gt;relrefarr will enlarge and re-hash.
When the capacity of owner-&gt;relrefarr is over than 10 million, enlarge and re-hash takes serial hours. And what's worse, increase ref will also take minutes, as the hash collision resolution is based on looking up an array in order.
When we want to publish 10 billion data under one partition table, it takes serial days up to increase ref, enlarge and re-hash, and CPU is always 99%.
After applying my patch, 10 billion will be published in 10 minutes.

------------------ Original ------------------
From: &nbsp;"Amit Langote";<amitlangote09(at)gmail(dot)com&gt;;
Send time:&nbsp;Friday, Apr 17, 2020 10:58 PM
To:&nbsp;"Peter Eisentraut"<peter(dot)eisentraut(at)2ndquadrant(dot)com&gt;;
Cc:&nbsp;"Petr Jelinek"<petr(at)2ndquadrant(dot)com&gt;; "Rafia Sabih"<rafia(dot)pghackers(at)gmail(dot)com&gt;; "PostgreSQL-development"<pgsql-hackers(at)postgresql(dot)org&gt;;
Subject: &nbsp;Re: adding partitioned tables to publications

On Fri, Apr 17, 2020 at 10:23 PM Peter Eisentraut
<peter(dot)eisentraut(at)2ndquadrant(dot)com&gt; wrote:
&gt; On 2020-04-09 09:28, Amit Langote wrote:
&gt; &gt; While figuring this out, I thought the nearby code could be rearranged
&gt; &gt; a bit, especially to de-duplicate the code.&nbsp; Also, I think
&gt; &gt; get_rel_sync_entry() may be a better place to set the map, rather than
&gt; &gt; maybe_send_schema().&nbsp; Thoughts?
&gt;
&gt; because I didn't really have an opinion on that at the time, but if you
&gt; still want it considered or have any open thoughts on this thread,
&gt; please resend or explain.

Sure, thanks for taking care of the bug.

Rebased the code rearrangement patch.&nbsp; Also resending the patch to fix
TAP tests for improving coverage as described in:
https://www.postgresql.org/message-id/CA%2BHiwqFyydvQ5g%3Dqa54UM%2BXjm77BdhX-nM4dXQkNOgH%3DzvDjoA%40mail.gmail.com

To summarize:
1. Missing coverage for a couple of related blocks in
apply_handle_tuple_routing()
2. Missing coverage report for the code in pgoutput.c added by 83fd4532

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
0001-RelationClose-after-RelationIdGetRelation.patch application/octet-stream 2.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-12-30 14:16:57 Re: Parallel Inserts in CREATE TABLE AS
Previous Message Luc Vlaming 2020-12-30 13:54:39 Re: allow partial union-all and improve parallel subquery costing