[PATCH] Improving index selection for logical replication apply with replica identity full

From: Ethan Mertz <ethan(dot)mertz(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "onderkalaci(at)gmail(dot)com" <onderkalaci(at)gmail(dot)com>
Subject: [PATCH] Improving index selection for logical replication apply with replica identity full
Date: 2026-05-22 17:18:12
Message-ID: CAA9pdKdMVBrcwORyLs_E-sw4zGzwumGfDn43Kj=FKWY9cASyJg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

I'd like to reopen the discussion on index selection for logical
replication apply for replica identity full. Since PostgreSQL 14, replica
identity full is able to make use of existing indexes [1][2] (authors in
CC) when replicating UPDATE or DELETE operations.

Today, when identifying which index to use for the update or delete, the
first suitable index is chosen by OID order, which generally corresponds to
creation order. If the chosen index has low cardinality, the lookup may
perform no better than a sequential scan. While avoiding replica identity
full is generally recommended, some users need to maintain REPLICA IDENTITY
FULL to support downstream logical consumers that require full row images.
These users would also like performant PostgreSQL to PostgreSQL replication.

I propose improving the index selection heuristic to prefer unique indexes,
favoring those with fewer columns. Previous discussion in the linked
threads avoided invoking the planner for full index selection; the
heuristic I propose serves as a middle ground. A unique index guarantees
that each tuple match requires at most one index scan, and among unique
indexes, fewer columns means a narrower, more efficient lookup. I have
attached a patch implementing this check.

In addition, I've performed some simple performance testing of this patch:

```
CREATE TABLE cardinality_index_test (
id uuid NOT NULL DEFAULT gen_random_uuid(),
is_active boolean NOT NULL DEFAULT true,
payload text NOT NULL DEFAULT repeat('x', 200)
);

CREATE INDEX idx_bad_low_cardinality ON cardinality_index_test (is_active);

CREATE UNIQUE INDEX idx_good_unique ON cardinality_index_test (id);
```

With the patch, the time to replicate 1,000 point updates on a table size
of 1,000,000 dropped from 53 seconds to under 1 second.

Feedback welcome.

Thanks,

Ethan Mertz
SDE, Amazon Web Services

[1]
https://www.postgresql.org/message-id/flat/CACawEhVLqmAAyPXdHEPv1ssU2c%3DdqOniiGz7G73HfyS7%2BnGV4w%40mail.gmail.com

[2]
https://www.postgresql.org/message-id/flat/TYAPR01MB58669D7414E59664E17A5827F522A%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Attachment Content-Type Size
v1-0001-Improve-index-selection-for-REPLICA-IDENTITY-FULL.patch application/x-patch 6.3 KB

Browse pgsql-hackers by date

  From Date Subject
Previous Message Japin Li 2026-05-22 16:24:54 Re: [PATCH] Add pg_get_policy_ddl() function to reconstruct CREATE POLICY statement