| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Cc: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org> |
| Subject: | Re: Proposal: Cascade REPLICA IDENTITY changes to leaf partitions |
| Date: | 2026-01-26 02:51:10 |
| Message-ID: | 0FD48322-3939-471F-AE45-5B4C6B39E835@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Dec 29, 2025, at 16:24, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
>
>
>> On Dec 22, 2025, at 14:26, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> On Fri, Dec 19, 2025 at 11:14 AM Zhijie Hou (Fujitsu)
>> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>>>
>>> On Thursday, December 18, 2025 12:21 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>>>>
>>>>> On Dec 17, 2025, at 16:48, Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com>
>>>> wrote:
>>>>>
>>>>> On Wednesday, December 17, 2025 3:56 PM Chao Li
>>>> <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>>>>>> Thank you both for all your advice. Here comes my first
>>>>>> implementation of INHERIT in the attached v2 patch.
>>>>>>
>>>>>> On Wed, Dec 17, 2025 at 8:11 AM Euler Taveira
>>>> <mailto:euler(at)eulerto(dot)com> wrote:
>>>>>>
>>>>>>> I wondering if we use INHERIT as default. The main advantage is
>>>>>>> usability as Chao Li already mentioned. Is there any cases that
>>>>>>> having a different replica identity from parent/partitioned table makes
>>>> sense?
>>>>>>
>>>>>> We can leave this topic open for discussion. In my current
>>>>>> implementation, NO INHERIT is still the default. But if we decide to
>>>>>> switch the default, I can add a new commit that should include only 1
>>>>>> line code change in gram.y and a tiny doc update.
>>>>>>
>>>>>> 0001 - when a new partition is created, use the parent's replication
>>>>>> identity
>>>>>> 0002 - add INHERIT | NO INHERIT
>>>>>
>>>>
>>>> Hi Zhijie,
>>>>
>>>> Thanks for your feedback and linked information. I think this patch is avoiding
>>>> the hard problem of “index” RI.
>>>>
>>>>>
>>>>> I think there are several design considerations for this proposal:
>>>>>
>>>>> 1) Since the index names can vary across different partitions, what
>>>>> should be the expected behavior if a new partition cannot identify the
>>>>> same replica identity key as the root partitioned table?
>>>>
>>>> Index RI is skipped in this patch. INHERT only works for DEFAULT, FULL and
>>>> NONE.
>>>>
>>>
>>> I personally find skipping this part to be inconsistent, particularly given the
>>> existing disparities among ALTER TABLE commands related to partitioned table handling.
>>> Omitting this part introduces further inconsistency within the ALTER TABLE
>>> REPLICA IDENTITY.
>>>
>>
>> Fair point. I think one should summarize the previous discussions with
>> key differences and where the previous patch got stuck. Then, it would
>> be good to get some feedback from the people involved previously. If
>> there is an agreement that we can do INHERIT stuff for specific parts
>> then fine, otherwise, I think we need to address the index part as
>> well.
>
>> [1] https://www.postgresql.org/message-id/flat/201902041630.gpadougzab7v%40alvherre.pgsql
>> [2] https://www.postgresql.org/message-id/flat/OSBPR01MB2982A2738F16722899A50082FECB0%40OSBPR01MB2982.jpnprd01.prod.outlook.com#2e5388a7cde3c10430f8668a6c381b06
>
> Sure, let me try to summarize the two discussions.
>
> For [1], key participants included Álvaro Herrera (patch author), Dmitry Dolgov (reviewer/tester), Robert Haas, Simon Riggs, Michael Paquier, and Peter Eisentraut.
> ----
>
> Brief summary with rough timeline:
>
> * In 2019, Álvaro proposed a patch that essentially did the same thing as this patch. That patch also attempted to handle “index-based” replica identity, which my patch intentionally avoids.
> * Dmitry tested the patch and reported issues related to attaching partitions, including index-related errors.
> * Robert then pointed out that having REPLICA IDENTITY recurse was inconsistent with the behavior of other ALTER TABLE actions such as TABLESPACE, and emphasized that we should really try to think through *all* actions that might recurse to child partitions.
> * Álvaro noted the use of `ALTER TABLE ONLY`, suggesting that WITHOUT ONLY could recurse to children, while ONLY would affect just the parent.
> * Simon commented that recursing TABLESPACE changes could be problematic because they imply physical data rewrites.
> * Robert listed several ALTER TABLE actions that lacked consistent recurse behavior (identity columns, triggers, CLUSTER, OWNER, TABLESPACE, CHECK constraints).
> * This led to broader discussion about whether TABLESPACE/OWNER/etc. should recurse.
> * Michael echoed support for having REPLICA IDENTITY recurse, controlled via ONLY.
> * Peter pointed out that recursing ADD GENERATED AS IDENTITY / DROP IDENTITY may not be feasible.
> * Álvaro wanted to proceed with the patch.
> * Robert maintained that defining consistent semantics for all relevant ALTER TABLE actions should come first.
>
> Overall, the blocker was an unresolved semantic disagreement, rather than a concrete technical objection to the patch itself. There appeared to be broad agreement that:
> * New partitions should use the parent’s replica identity.
> * ONLY could be used to control whether replica identity changes recurse.
>
> However, the discussion about “semantic consistency” significantly broadened the scope, and there was no clear agreement on whether TABLESPACE/OWNER/etc. should recurse, which ultimately stalled the effort.
>
> For [2], key participants included: Takayuki Tsunakawa (patch author), Bharath Rupireddy (reviewer), Álvaro Herrera, Michael Paquier, and Kyotaro Horiguchi.
> ----
>
> Brief summary with rough timeline:
>
> * In 2020, Takayuki proposed a patch intended to propagate ALTER TABLE SET LOGGED / UNLOGGED to partitions.
> * Bharath reviewed the patch and raised a number of questions and edge cases.
> * There were initial discussions about the patch mechanics and expected behavior.
> * Álvaro then pointed out the strong relation to the earlier discussion in [1].
> * The focus of the discussion shifted to the more fundamental question of “what is the parent?”:
> * Takayuki viewed ALTER TABLE on a partitioned table as a convenient batch operation on existing partitions, with future partitions remaining independent.
> * Álvaro, Michael, and Kyotaro argued that changing a property on the parent should define the default for future partitions as well.
> * No clear agreement was reached on this semantic question, and the discussion expanded into broader concerns about consistency across ALTER TABLE actions.
> * Takayuki withdrew the patch
>
> Overall, [2] also fell under the umbrella of “semantic consistency”, the main discussion was not about replica identity itself.
>
> Current situation:
> ----
>
> * Resolving “semantic consistency” across ALTER TABLE actions in a single release appears to be the biggest challenge. However, addressing everything in one patch set does not seem realistic.
> * The core question of “what is the parent?” from [1] remains central. That said, the discussion appeared to lean toward the view that future partitions should inherit properties from the parent.
> * Different properties behave very differently. For example, propagating REPLICA IDENTITY is a metadata-only change and relatively safe, while propagating TABLESPACE implies physical data movement and is much riskier. As a result, each ALTER TABLE action may deserve its own discussion and patch set.
> * The inconsistency is not limited to ALTER TABLE but also exists in CREATE TABLE behavior. For example, a new partition inherits TABLESPACE from the parent, but not REPLICA IDENTITY.
> * “ALTER TABLE ONLY table_name” is commonly suggested as the mechanism to control whether changes should recurse to partitions.
>
> How to proceed?
> ----
>
> If we stop here, these inconsistencies will remain indefinitely, which I believe nobody really wants. With that in mind, I’d like to suggest a two-phase approach.
>
> Phase 1: Document current behavior and set expectations
>
> * Identify all ALTER TABLE actions involved in these inconsistencies.
> * Update the ALTER TABLE and CREATE TABLE documentation to clearly describe the current behavior for partitioned tables, and (where appropriate) the intended or ideal behavior.
> * Explicitly document the meaning of ONLY for partitioned tables, and note that some actions may behave differently, with details described in each action’s section.
>
> Phase 2: Address actions incrementally
>
> * Work on each ALTER TABLE action individually, recognizing that some may be straightforward while others require more design discussion, coding, and testing.
> * With Phase 1 in place to set expectations, it may not be necessary to complete all actions in a single release.
> * That said, it would still be desirable to keep the work bounded, for example within one or two major releases, to avoid long-term fragmentation.
>
> I’ve included the participants from the previous discussions on CC, in case they want to comment further.
>
To move this patch forward, I’ve been working on a few related patches in parallel:
• [1] Enhances the “ALTER TABLE” documentation to clarify how subcommands behave on partitioned tables. In that discussion, I summarized the current behaviors of all subcommands into 15 categories.
• [2] Adds a NOTICE when ONLY is not specified but the action does not recurse to child partitions.
• [3] Improves the header comments in tablecmds.c to better explain the meaning of “recurse” and “recursing”.
While working on [1], [2] and [3], I gained a deeper understanding of ALTER TABLE behavior. Based on that, v3 is a complete rework compared to v2. The behavior implemented by v3 is:
• Recursion is performed during the preparation phase.
• When ONLY is specified, no recursion occurs; without ONLY, the command recurses to all child partitions.
• All replica identity types are supported, including USING INDEX (which was not supported in v2).
• For non-index replica identities, the existing ATSimpleRecursion() infrastructure is reused; for index-based replica identity, the corresponding index name must be determined for each child partition.
In a previous discussion [4], Dmitry Dolgov pointed out a test case that resulted in a DEADLOCK. I ran that test against v3. The test still fails, but I no longer observe a deadlock; instead, the server now crashes during partition attachment. I will investigate this further.
While working on v3, I also noticed another issue that may be a bug. When creating an index on a partitioned table, the index is automatically created on all partitions. However, if a column type change causes an index rebuild, it appears that the rebuilt index may be missing from child partitions. I will verify this behavior and, if confirmed, discuss it in a separate thread.
At this point, I think v3 is in a reviewable state.
[1] https://postgr.es/m/CAEoWx2%3DmYhCfsnHaN96Qqwq5b0GVS2YgO3zpVqPPRd_iO52wRw%40mail.gmail.com
[2] https://postgr.es/m/CAEoWx2=SLga-xH09Cq_PAvsHhQHrBK+V0vF821JKgzS=Bm0haA@mail.gmail.com
[3] https://postgr.es/m/CAEoWx2n9E6_zxPbqwMpaPuC1C_p4b3y635SjiDuCPSVm8GBjsA@mail.gmail.com
[4] https://www.postgresql.org/message-id/flat/201902041630.gpadougzab7v%40alvherre.pgsql
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| Attachment | Content-Type | Size |
|---|---|---|
| v3-0001-Inherit-replica-identity-for-new-and-attached-par.patch | application/octet-stream | 8.8 KB |
| v3-0002-ALTER-TABLE-recurse-REPLICA-IDENTITY-to-partition.patch | application/octet-stream | 13.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | David Rowley | 2026-01-26 03:03:19 | Re: unnecessary executor overheads around seqscans |
| Previous Message | Junwang Zhao | 2026-01-26 02:41:22 | Re: [PATCH] Replace COUNT(NULL) with '0'::bigint |