Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2022-08-01 16:21:48
Message-ID: CACawEhXX7JcJVwmbQzd54ehxZTUF5th+XhAz04TPkLVQdT5m+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

As far as I can see, the following is the answer to the only remaining open
discussion in this thread. Let me know if anything is missed.

(b) it appears to me that the patch decides
> >> which index to use the first time it opens the rel (or if the rel gets
> >> invalidated) on subscriber and then for all consecutive operations it
> >> uses the same index. It is quite possible that after some more
> >> operations on the table, using the same index will actually be
> >> costlier than a sequence scan or some other index scan
> >
> >
> > Regarding (b), yes that is a concern I share. And, I was actually
> considering sending another patch regarding this.
> >
> > Currently, I can see two options and happy to hear your take on these
> (or maybe another idea?)
> >
> > - Add a new class of invalidation callbacks: Today, if we do ALTER TABLE
> or CREATE INDEX on a table, the CacheRegisterRelcacheCallback helps us to
> re-create the cache entries. In this case, as far as I can see, we need a
> callback that is called when table "ANALYZE"d, because that is when the
> statistics change. That is the time picking a new index makes sense.
> > However, that seems like adding another dimension to this patch, which I
> can try but also see that committing becomes even harder.
> >
>
> This idea sounds worth investigating. I see that this will require
> more work but OTOH, we can't allow the existing system to regress
> especially because depending on workload it might regress badly. We
> can create a patch for this atop the base patch for easier review/test
> but I feel we need some way to address this point.
>
>
It turns out that we already invalidate the relevant entries
in LogicalRepRelMap/LogicalRepPartMap when "ANALYZE" (or VACUUM) updates
any of the statistics in pg_class.

The call-stack for analyze is roughly:
do_analyze_rel()
-> vac_update_relstats()
-> heap_inplace_update()
-> if needs to apply any statistical change
-> CacheInvalidateHeapTuple()

And, we register for those invalidations already:
logicalrep_relmap_init() / logicalrep_partmap_init()
-> CacheRegisterRelcacheCallback()

Added a test which triggers this behavior. The test is as follows:
- Create two indexes on the target, on column_a and column_b
- Initially load data such that the column_a has a high cardinality
- Show that we use the index on column_a
- Load more data such that the column_b has higher cardinality
- ANALYZE on the target table
- Show that we use the index on column_b afterwards

Thanks,
Onder KALACI

Attachment Content-Type Size
v4_0001_use_index_on_subs_when_pub_rep_ident_full.patch application/x-patch 56.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2022-08-01 16:30:39 Re: [Commitfest 2022-07] Patch Triage: Waiting on Author
Previous Message Simon Riggs 2022-08-01 16:00:41 Re: Dump/Restore of non-default PKs