Re: [PATCH] Improving index selection for logical replication apply with replica identity full

From: Ethan Mertz <ethan(dot)mertz(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "onderkalaci(at)gmail(dot)com" <onderkalaci(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: [PATCH] Improving index selection for logical replication apply with replica identity full
Date: 2026-06-29 13:51:40
Message-ID: CAA9pdKf1jUvFDzRxiEYjzUh4Rq-cUOb7QcnpvB0OFmtZbOEGFw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> I want to make sure we don't regress in the case where the unique
> index has more bloat than the non-unique index. I experimented with a
> small dataset where the unique index grew to 4.5 GB due to bloat and
> observed that the patched was about 20% slower (a couple of seconds at
> this scale, but in production with larger tables, more concurrent
> activity, more bloat and limited memory, I would expect the gap to be
> wider). Would you be able to provide some additional data points on
> this before we proceed further? That would help confirm the heuristic
> holds up across different bloat conditions.

I agree that choosing a bloated unique index could lead to worse
performance than a non-bloated non-unique index in certain cases.
However, I think there are many cases where the more bloated, larger
index would perform significantly better than a less bloated index with
worse
selectivity. Without the exact numbers, it should be clear to see that the
example from my original email would be one of those cases. Deciding
which index to choose based off of size alone and not taking into account
other statistics would likely lead to many wrong decisions.

Moreover, I would argue that even if the choice of a unique index led to
worse performance, it should not be considered a regression. Today
index selection is essentially random, therefore, there is no guarantee
about which index is chosen. I'd argue that a savvy user must assume
that the worst index is chosen when reasoning about performance. In
addition, given that this patch will likely only be applied on a new major
version, any stability of index ordering for the selection would be changed
during the dump and restore.

I'd reiterate as well that this is a small incremental improvement which I
found would be helpful in a few situations that I have observed in user
workload. I don't think that this excludes any future optimizations
including
more factors such as size/bloat, but those must be considered in
combination with other statistics. I'd be interested in looking into and
helping
out with the development of those features in the future.

Best,
Ethan
SDE, Amazon Web Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2026-06-29 13:54:56 Re: coerce_type discard unnecessary CollateExprs
Previous Message Akshay Joshi 2026-06-29 13:42:06 Re: [PATCH] Add pg_get_policy_ddl() function to reconstruct CREATE POLICY statement