Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2022-07-20 14:49:32
Message-ID: CACawEhUZ=7nT=KWGza+ME0ztx=nczzOpdAQELLqr9KpcyKOCtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> > - Add a new class of invalidation callbacks: Today, if we do ALTER TABLE
> or CREATE INDEX on a table, the CacheRegisterRelcacheCallback helps us to
> re-create the cache entries. In this case, as far as I can see, we need a
> callback that is called when table "ANALYZE"d, because that is when the
> statistics change. That is the time picking a new index makes sense.
> > However, that seems like adding another dimension to this patch, which I
> can try but also see that committing becomes even harder.
> >
>
> This idea sounds worth investigating. I see that this will require
> more work but OTOH, we can't allow the existing system to regress
> especially because depending on workload it might regress badly.

Just to note if that is not clear: This patch avoids (or at least aims to
avoid assuming no bugs) changing the behavior of the existing systems with
PRIMARY KEY or UNIQUE index. In that case, we still use the relevant
indexes.

> We
> can create a patch for this atop the base patch for easier review/test
> but I feel we need some way to address this point.
>
>
One another idea could be to re-calculate the index, say after *N*
updates/deletes for the table. We may consider using subscription_parameter
for getting N -- with a good default, or even hard-code into the code. I
think the cost of re-calculating should really be pretty small compared to
the other things happening during logical replication. So, a sane default
might work?

If you think the above doesn't work, I can try to work on a separate patch
which adds something like "analyze invalidation callback".

> >
> > - Ask users to manually pick the index they want to use: Currently, the
> main complexity of the patch comes with the planner related code. In fact,
> if you look into the logical replication related changes, those are
> relatively modest changes. If we can drop the feature that Postgres picks
> the index, and provide a user interface to set the indexes per table in the
> subscription, we can probably have an easier patch to review & test. For
> example, we could add `ALTER SUBSCRIPTION sub ALTER TABLE t USE INDEX i`
> type of an API. This also needs some coding, but probably much simpler than
> the current code. And, obviously, this pops up the question of can users
> pick the right index?
> >
>
> I think picking the right index is one point and another is what if
> the subscription has many tables (say 10K or more), doing it for
> individual tables per subscription won't be fun. Also, users need to
> identify which tables belong to a particular subscription, now, users
> can find the same via pg_subscription_rel or some other way but doing
> this won't be straightforward for users. So, my inclination would be
> to pick the right index automatically rather than getting the input
> from the user.
>

Yes, all makes sense.

>
> Now, your point related to planner code in the patch bothers me as
> well but I haven't studied the patch in detail to provide any
> alternatives at this stage. Do you have any other ideas to make it
> simpler or solve this problem in some other way?
>
>
One idea I tried earlier was to go over the existing indexes and on the
table, then get the IndexInfo via BuildIndexInfo(). And then, try to find a
good heuristic to pick an index. In the end, I felt like that is doing a
sub-optimal job, requiring a similar amount of code of the current patch,
and still using the similar infrastructure.

My conclusion for that route was I should either use a very simple
heuristic (like pick the index with the most columns) and have a suboptimal
index pick, OR use a complex heuristic with a reasonable index pick. And,
the latter approach converged to the planner code in the patch. Do you
think the former approach is acceptable?

Thanks,
Onder

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2022-07-20 14:50:44 Remove useless arguments in ReadCheckpointRecord().
Previous Message Tom Lane 2022-07-20 14:20:04 Re: pgsql: Default to hidden visibility for extension libraries where possi