| From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | Euler Taveira <euler(at)eulerto(dot)com>, GRANT ZHOU <grantzhou(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
| Subject: | Re: Improve logical replication usability when tables lack primary keys |
| Date: | 2026-04-08 10:22:14 |
| Message-ID: | CAJpy0uAHiwssGCgE54zHgDPPhV4h+O5aeM03D-go8EjY1dNPCg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Dec 19, 2025 at 1:39 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
>
>
> > On Dec 18, 2025, at 22:49, Euler Taveira <euler(at)eulerto(dot)com> wrote:
> >
> > On Wed, Dec 17, 2025, at 6:43 PM, GRANT ZHOU wrote:
> >> On Wed, Dec 17, 2025 at 12:50 PM Euler Taveira <euler(at)eulerto(dot)com> wrote:
> >>> Each table needs to say what's its row identifier. The user created a table
> >>> without primary key. Well, create a primary key. There are dozens of thousands
> >>> of objects. Use a script.
> >> However, I’d like to share a user perspective regarding the "use a
> >> script" approach. The main value of `FOR TABLES IN SCHEMA` is
> >> *in-database automation*. If users still need to maintain external
> >> scripts to monitor and `ALTER` new tables to prevent replication
> >> errors, it significantly diminishes the value of that automation.
> >>
> >
> > As I tried to explain in the previous email, the problem with FOR ALL TABLES
> > and FOR TABLES IN SCHEMA syntax is that the is no catalog information about the
> > relations; the list of relations is collected at runtime.
> >
> > When I suggested "use a script" I was referring to fix the logical replication
> > setup regarding the lack of primary key. There is no need to have an automation
> > outside the database, use an event trigger. If your lazy user doesn't create
> > the primary key, assign REPLICA IDENTITY FULL. Something like
> >
> > -- This example is far from being a complete solution for fixing the lack of
> > -- primary key in a logical replication scenario.
> > -- ALTER TABLE should be supported too
> > CREATE OR REPLACE FUNCTION event_trigger_for_replica_identity()
> > RETURNS event_trigger LANGUAGE plpgsql AS $$
> > DECLARE
> > obj record;
> > rec record;
> > ricnt integer := 0;
> > BEGIN
> > FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands()
> > LOOP
> > IF obj.command_tag = 'CREATE TABLE' THEN
> > SELECT COUNT(*) INTO ricnt FROM pg_index WHERE indrelid = obj.objid AND indisprimary;
> > RAISE NOTICE 'ricnt: %', ricnt;
> > IF ricnt = 0 THEN
> > EXECUTE 'ALTER TABLE ' || obj.object_identity || ' REPLICA IDENTITY FULL';
> > END IF;
> > END IF;
> > END LOOP;
> > END;
> > $$;
> >
> > CREATE EVENT TRIGGER event_trigger_for_replica_identity
> > ON ddl_command_end
> > EXECUTE FUNCTION event_trigger_for_replica_identity();
> >
> > CREATE TABLE event_trigger_test_1 (a int);
> > \d+ event_trigger_test_1
> > CREATE TABLE event_trigger_test_2 (a int primary key);
> > \d+ event_trigger_test_2
> > CREATE TABLE event_trigger_test_3 (a int, b text not null, primary key(b));
> > \d+ event_trigger_test_3
> > --ALTER TABLE event_trigger_test_3 DROP CONSTRAINT event_trigger_test_3_pkey;
> > --\d+ event_trigger_test_3
> >
> > DROP EVENT TRIGGER event_trigger_for_replica_identity;
> > DROP FUNCTION event_trigger_for_replica_identity;
> > DROP TABLE event_trigger_test_1, event_trigger_test_2, event_trigger_test_3;
> >
> > 8<----------------------------------------------------------------------------8<
> >
> >> Additionally, tables without Primary Keys are valid SQL and extremely
> >> common in enterprise environments (e.g., audit logs, data warehousing).
> >> In large-scale deployments, enforcing PKs on every single table isn't
> >> always practical.
> >>
> >
> > I'm not saying users shouldn't create tables without a primary key. I'm arguing
> > that this decision should take into account what adjustments need to be made to
> > use these tables in logical replication.
> >
> >>
> >> I think the goal of this proposal is not to change the underlying table
> >> property design, but rather to seek a mechanism (like a Publication
> >> option) to ensure this automation functions safely without external
> >> intervention. It is simply about allowing the database to handle these
> >> valid, common scenarios gracefully when automation is enabled.
> >>
> >
> > You didn't get it. You already have one property to handle it and you are
> > proposing to add a second property to handle it.
> >
> > I think you are pursuing the wrong solution. IMO we need a solution to enforce
> > that the logical replication contract is valid. If you create or modify a table
> > that is part of a publication, there is no validation that that table complies
> > with the publication properties (update and delete properties should require an
> > appropriate replica identity). We should close the gaps in both publication and
> > table.
> >
>
> If I summarize Euler’s position in short words: discipline over convenience. I actually strongly agree with that. In PG we generally prefer explicit over implicit behavior, and predictability over magic.
>
> Based on the discussion so far, I think we share the following design goals:
>
> 1) Keep replica identity as a table property.
> 2) Avoid silent runtime failures when FOR TABLES IN SCHEMA pulls in tables without primary keys.
> 3) Avoid global or implicit behavior changes.
> 4) Preserve explicit opt-in for higher WAL cost.
> 5) Keep the logical replication contract explicit and enforceable.
>
> I’ve been thinking about whether adding a new replica identity could meet these goals.
>
> Today we have four replica identities: DEFAULT (PK, fallback to NONE), INDEX, FULL, and NONE.
>
> What if we introduce a new replica identity, tentatively called “FORCE”: PK with fallback to FULL. (Let’s keep our focus on the design, not argue the name for now.)
>
> With this approach:
>
> 1) Replica identity remains a table property.
> 2) Publication membership is still evaluated at runtime, so FOR TABLES IN SCHEMA is not special-cased.
> 3) No new GUCs are required.
> 4) The user must explicitly opt in by setting the replica identity. Once FORCE is chosen, adding or dropping a primary key later does not silently break UPDATE/DELETE replication.
> 5) The logical replication contract remains explicit; the table declares that it is safe for UPDATE/DELETE replication even without a PK, at the cost of higher WAL volume.
>
> This feels like a small, explicit extension of the existing RI semantics. Notably, REPLICA IDENTITY DEFAULT already has conditional behavior (PK fallback
> to NONE), so conditional RI behavior is not new, this would just make a different fallback explicit and user-chosen.
Even though I like the idea of keeping replica identity as a table
property, this raises a question about the need for a new FORCE
option. With this approach, the user must explicitly opt REPLICA
IDENTITY FORCE for tables without a primary key (pt 4 in your design).
But if the user is already expected to take an explicit action, they
could simply set REPLICA IDENTITY FULL themselves at table creation
time or via ALTER TABLE. From what I understood, the original
requirement was to ensure that tables without a primary key do not
silently fail replication, and that this is handled automatically
without manual intervention. In contrast, the FORCE approach still
requires users to explicitly configure replica identity for each
table, which seems to reintroduce the same operational burden.
For example, instead of:
ALTER TABLE t1 REPLICA IDENTITY FORCE;
the user could simply do:
ALTER TABLE t1 REPLICA IDENTITY FULL
By the time, the user is explicitly configuring replica identity, they
would already need to understand the replication semantics, so it's
not clear what additional benefit FORCE provides over FULL. Am I
missing something here?
thanks
Shveta
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jim Jones | 2026-04-08 10:41:00 | Re: Fix bug with accessing to temporary tables of other sessions |
| Previous Message | Tomas Vondra | 2026-04-08 10:20:53 | Re: Adding REPACK [concurrently] |