| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
| Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Improve logical replication usability when tables lack primary keys |
| Date: | 2025-12-16 06:47:26 |
| Message-ID: | D4A3A4D7-2F51-4F0D-9FDE-95E9C318148E@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On Dec 15, 2025, at 13:48, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Dec 15, 2025 at 9:06 AM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>>
>>> On Dec 15, 2025, at 11:28, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>>>
>>> On Tue, Nov 11, 2025 at 6:11 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>>>>
>>>> Hi Amit,
>>>>
>>>> Thanks for asking.
>>>>
>>>>> On Nov 11, 2025, at 19:18, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>>>>
>>>>> On Mon, Nov 10, 2025 at 1:36 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>>>>>>
>>>>>> * BACKGROUND
>>>>>>
>>>>>> This requirement comes from several users operating large deployments, particularly in HIS (Hospital Information Systems). The situation can be summarized as follows:
>>>>>>
>>>>>> - A central DB operations team maintains the main database and configures logical replication for all tables.
>>>>>> - Multiple third-party application vendors are allowed to create new tables in that database.
>>>>>> - Some of these newly created tables lack a primary key. Since logical replication with `REPLICATION IDENTITY DEFAULT` requires a primary key, such tables silently fail to replicate.
>>>>>> - The DB operations team must then spend significant effort identifying the affected tables and correcting them manually.
>>>>>>
>>>>>
>>>>> Can you share an example of how we silently fail to replicate? Won't
>>>>> in such cases UPDATE/DELETE will anyway raise an ERROR?
>>>>>
>>>>
>>>> Yes, UPDATE/DELETE will fail. That’s the easy case to expose the error. Actually my patch will allow the update/delete.
>>>>
>>>> However, some tables, like dictionary tables, they are important, but don’t have much update/delete, they may silently fail to replicate.
>>>
>>> But other than UPDATE/DELETE for what operation we need RI, I mean
>>> INSERT would work without any RI and UPDATE/DELETE will fail on the
>>> publisher itself without setting RI, so can you explain the exact case
>>> where it will silently fail to replicate?
>>>
>>
>> Thanks for asking. When fallback to FULL, UPDATE/DELETE will be allowed in the publisher side. In my first email, attached v1 patch is a PoC that has implemented the logic.
>>
>
> So, without patch, there is no way we can silently replicate the
> UPDATE/DELETE. Ideally, users should alter the tables and make RI as
> FULL in such cases if they don't have PK for such tables. Falling back
> to FULL for DEFAULT when the table doesn't have PK based on GUC has a
> downside that it will increase WAL volume by a large amount.
I agree that this downside exists, but it is an inherent cost that users must accept if they choose to replicate all tables, including those without a primary key. In practice, users who opt into such a configuration are typically aware of the WAL overhead and make that trade-off consciously.
> I think it should be done specific to tables that users want to replicate.
That is why I mentioned earlier that the new GUC should only be configurable at the database level (via ALTER DATABASE). However, I agree that there is still a risk that a user could mistakenly set it in postgresql.conf, thereby making it effective for the entire cluster.
> I don't know what is a good way to give to users who don't want to do
> the required setup but if we really want to provide something, it is
> better to allow such a thing via the publication option instead.
Using a publication-level option could also work. One complication, however, is that a table can belong to multiple publications. For example, if table_a belongs to both pub_a and pub_b, and only pub_a is configured with fallback_to_full while pub_b keeps the default behavior (fallback_to_none), then the effective behavior for table_a would need to remain fallback_to_none, meaning that UPDATE/DELETE would still not be allowed if table_a has not a primary key.
> I think it would be good to do such an enhancement if we have more
> community support and some other users also appreciate such a feature.
> Otherwise, adding something which is specific to a particular user
> sounds like a recipe of maintenance burden especially when we already
> provide a way to achieve the same thing as is required by the user.
Let me elaborate on that point.
My company has a very large user base in China, with over 100K deployments across multiple industries. However, there is currently a significant gap between this large user population and direct participation in the PG community. I joined the company in July this year as a full-time contributor to the PG community, and one of my responsibilities is to help bridge this gap and bring real-world user feedback into community discussions.
As I mentioned in my earlier email, this requirement comes from large-scale deployments. The database owners in these environments have operational models that may not always align with what we consider the ideal or fully optimized setup, but they are the result of years of accumulated practice and operational experience. For these users, the proposed feature would significantly simplify their day-to-day operations and reduce operational friction.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bertrand Drouvot | 2025-12-16 06:49:13 | Change the signature of pgstat_report_vacuum() so that it's passed a Relation |
| Previous Message | Xuneng Zhou | 2025-12-16 06:42:00 | Re: Implement waiting for wal lsn replay: reloaded |