Re: [PATCH] Improving index selection for logical replication apply with replica identity full

From: Ethan Mertz <ethan(dot)mertz(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "onderkalaci(at)gmail(dot)com" <onderkalaci(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: [PATCH] Improving index selection for logical replication apply with replica identity full
Date: 2026-06-23 21:09:15
Message-ID: CAA9pdKfACmWdDakOvMpjzfT9ikXWc4L5UNeUi8NABe6QwPMcfQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> On the heuristic itself, I am only mildly in favor, and I want to be
> honest about how narrow the benefit is. It only helps when a usable
> unique index already exists on the subscriber but is not picked first.
>
> But in that case the correct answer is REPLICA IDENTITY USING INDEX (or
> a primary key) on that index, which we already recommend. The case that
> really pushes people to use REPLICA IDENTITY FULL: no unique key is
> possible, only non-unique indexes is exactly the case this patch
> leaves unchanged. Even Ethan's own benchmark uses a table that has a
> unique index on "id", which would be better served by setting it as the
> replica identity.
>
> So I would describe this as a small, low-risk improvement to the default
> choice, I am fine with it on that basis.

I fully agree with this assessment of the change. It is both convenient and
simple for the apply worker to make a clearly better choice if the user
hasn't
specified the correct index to use as the replica identity. To further
justify
this patch, we have seen that this mistake has been made by real users
which then caused them pain through increased replication lag.

After some thought, I decided it would be best to align the change better
with this goal (making a simple decision), and therefore I removed the
logic to choose based on the number of key columns. Thus, I propose a
new patch (attached to this email) which only selects the first unique
index and returns early. This may partially address the feedback around
looping through the indexes.

Furthermore, this simplification makes the behavior more focused and
simple for users to understand when multiple indexes are involved.
Incorporating other aspects of the index (including the key column logic
which I had in v1-v3) would likely make the behavior less intuitive for
users.

> Yes, I agreed it's not a serious problem. just I wanted to see such the
micro
> bench.

ACK. I will perform some tests on tables with many indexes to see if there
is any performance degradation, and I will share the results shortly.

> It might be worth factoring in the index size when more than one index
> is usable unless others think otherwise. Since the replica identity
> index is only re-picked on relcache invalidation, the choice could go
> stale as bloat grows, so the apply worker might need to re-check the
> replica identity index choice periodically.

I partially spoke on this point earlier in my message, but my opinion is
that either apply keep the heuristic simplistic, or apply should go into
full
query planning. In addition, adding relation size to the heuristic would
make
the behavior both dynamically but also less predictable. For users
this might be difficult to understand.

Thank you,
Ethan Mertz
SDE, Amazon Web Services

Attachment Content-Type Size
v4-0001-Improve-index-selection-for-REPLICA-IDENTITY-FULL.patch application/octet-stream 5.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2026-06-23 21:11:04 Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions
Previous Message Bharath Rupireddy 2026-06-23 21:06:41 Re: Handle concurrent drop when doing whole database vacuum