Quick Links

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From:	Önder Kalacı <onderkalaci(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date:	2022-07-19 08:16:36
Message-ID:	CACawEhXgP_Kj_1iyNAp16MYos4Anrtz+OZVtj2z-QOPGdPCt_A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi, thanks for your reply.

Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, 18 Tem 2022 Pzt, 08:29 tarihinde
şunu yazdı:

> On Tue, Jul 12, 2022 at 7:07 PM Önder Kalacı <onderkalaci(at)gmail(dot)com>
> wrote:
> >
> > Hi hackers,
> >
> >
> > It is often not feasible to use `REPLICA IDENTITY FULL` on the
> publication, because it leads to full table scan
> >
> > per tuple change on the subscription. This makes `REPLICA IDENTITY FULL`
> impracticable -- probably other
> >
> > than some small number of use cases.
> >
>
> IIUC, this proposal is to optimize cases where users can't have a
> unique/primary key for a relation on the subscriber and those
> relations receive lots of updates or deletes?
>

Yes, that is right.

In a similar perspective, I see this patch useful for reducing the "use
primary key/unique index" requirement to "use any index" for a reasonably
performant logical replication with updates/deletes.

>
> It seems that in favorable cases it will improve performance but we
> should consider unfavorable cases as well. Two things that come to
> mind in that regard are (a) while choosing index/seq. scan paths, the
> patch doesn't account for cost for tuples_equal() which needs to be
> performed for index scans, (b) it appears to me that the patch decides
> which index to use the first time it opens the rel (or if the rel gets
> invalidated) on subscriber and then for all consecutive operations it
> uses the same index. It is quite possible that after some more
> operations on the table, using the same index will actually be
> costlier than a sequence scan or some other index scan
>

Regarding (b), yes that is a concern I share. And, I was actually
considering sending another patch regarding this.

Currently, I can see two options and happy to hear your take on these (or
maybe another idea?)

- Add a new class of invalidation callbacks: Today, if we do ALTER TABLE or
CREATE INDEX on a table, the CacheRegisterRelcacheCallback helps us to
re-create the cache entries. In this case, as far as I can see, we need a
callback that is called when table "ANALYZE"d, because that is when the
statistics change. That is the time picking a new index makes sense.
However, that seems like adding another dimension to this patch, which I
can try but also see that committing becomes even harder. So, please see
the next idea as well.

- Ask users to manually pick the index they want to use: Currently, the
main complexity of the patch comes with the planner related code. In fact,
if you look into the logical replication related changes, those are
relatively modest changes. If we can drop the feature that Postgres picks
the index, and provide a user interface to set the indexes per table in the
subscription, we can probably have an easier patch to review & test. For
example, we could add `ALTER SUBSCRIPTION sub ALTER TABLE t USE INDEX i`
type of an API. This also needs some coding, but probably much simpler than
the current code. And, obviously, this pops up the question of can users
pick the right index? Probably not always, but at least that seems like a
good start to use this performance improvement.

Thoughts?

Thanks,
Onder Kalaci

In response to

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher at 2022-07-18 06:29:01 from Amit Kapila

Responses

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher at 2022-07-20 04:50:45 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2022-07-19 08:17:18	Re: Costing elided SubqueryScans more nearly correctly
Previous Message	Kyotaro Horiguchi	2022-07-19 08:13:08	Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns