| From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
|---|---|
| To: | Ethan Mertz <ethan(dot)mertz(at)gmail(dot)com> |
| Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "onderkalaci(at)gmail(dot)com" <onderkalaci(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
| Subject: | Re: [PATCH] Improving index selection for logical replication apply with replica identity full |
| Date: | 2026-06-23 20:39:14 |
| Message-ID: | CALj2ACXDejPQPgEG+1zyG9LdH7pkvHfekh2hDd8OW_58gJA-rQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On Wed, Jun 17, 2026 at 9:18 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> I'm not sure we need another option for this. My concern is that just
> choosing a unique index over a non-unique one can lead to suboptimal
> apply performance if the unique index has more bloat.
I spent some time experimenting with the patch to see whether it can
end up choosing a more-bloated unique index when a less-bloated
non-unique index that can find the same row is available, and whether
that makes the apply slower.
When the apply worker looks up the local row to update, it scans the
chosen index using a dirty snapshot and walks matching row versions
until it reaches the live one. The slow part is how many dead versions
it walks, not the tree height. If updates change a non-unique index
column but not the unique index's columns, the non-unique index's
entries spread across different keys while the unique index's pile up
under the same key, making its chain longer.
I created a non-unique index (chosen today by creation order) and a
unique index (which the patch would prefer), both giving one row per
search key. I bloated both - the unique index grew to 4.5 GB against
450 MB for the non-unique one. I then replicated 1000 point updates
spread across the table. Patched spent 13s in the index lookup against
11s for the HEAD, 20% slower. The gap is small for this workload, but
I expect it to get bigger at scale - update-heavy workloads, longer
version chains, wider index keys, indexes of hundreds of GBs size with
limited memory, and concurrent activity on the subscriber etc.
It might be worth factoring in the index size when more than one index
is usable unless others think otherwise. Since the replica identity
index is only re-picked on relcache invalidation, the choice could go
stale as bloat grows, so the apply worker might need to re-check the
replica identity index choice periodically.
Happy to hear thoughts on this.
--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Heikki Linnakangas | 2026-06-23 20:40:20 | Little checksum worker cleanups |
| Previous Message | Peter Geoghegan | 2026-06-23 20:06:13 | Re: Problems with get_actual_variable_range's VISITED_PAGES_LIMIT |