Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-03-12 15:07:21
Message-ID: CACawEhWfqUcS8Hofss73P-20Hp4uMpRk1o3TCXU+wjzjwPhx7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit, all

> >> I think we can add such a test (which relies on existing buggy
> >> behavior) later after fixing the existing bug. For now, it would be
> >> better to remove that test and add it after we fix dropped columns
> >> issue in HEAD.
> >
> >
> > Alright, when I push the next version (hopefully tomorrow), I'll follow
> this suggestion.
> >
>
> Okay, thanks. See, if you can also include your changes in the patch
> wip_for_optimize_index_column_match (after discussed modification).
> Few other minor comments:
>

Sure, done. Please check RemoteRelContainsLeftMostColumnOnIdx() function.

Note that we already have a test for that on SOME NULL VALUES AND MISSING
COLUMN.
Previously we'd check if test_replica_id_full_idy is used. Now, we don't
because it is not
used anymore :) I initially used poll_query_until with idx_scan=0, but that
also seems
confusing to read in the test. It looks like it could be prone to race
conditions as
poll_query_until with idxcan=0 does not guarantee anything.

>
> 1.
> + are enforced for primary keys. Internally, we follow a similar
> approach for
> + supporting index scans within logical replication scope. If there are
> no
>
> I think we can remove the above line: "Internally, we follow a similar
> approach for supporting index scans within logical replication scope."
> This didn't seem useful for users.
>
>
removed

> 2.
> diff --git a/src/backend/executor/execReplication.c
> b/src/backend/executor/execReplication.c
> index bc6409f695..646e608eb7 100644
> --- a/src/backend/executor/execReplication.c
> +++ b/src/backend/executor/execReplication.c
> @@ -83,11 +83,8 @@ build_replindex_scan_key(ScanKey skey, Relation
> rel, Relation idxrel,
> if (!AttributeNumberIsValid(table_attno))
> {
> /*
> - * XXX: For a non-primary/unique index with an
> additional
> - * expression, we do not have to continue at
> this point. However,
> - * the below code assumes the index scan is
> only done for simple
> - * column references. If we can relax the
> assumption in the below
> - * code-block, we can also remove the continue.
> + * XXX: Currently, we don't support
> expressions in the scan key,
> + * see code below.
> */
>
>
> I have tried to simplify the above comment. See, if that makes sense to
> you.
>

Makes sense

>
> 3.
> /*
> + * We only need to allocate once. This is allocated within per
> + * tuple context -- ApplyMessageContext -- hence no need to
> + * explicitly pfree().
> + */
>
> We normally don't write why we don't need to explicitly pfree. It is
> good during the review but not sure if it is a good idea to keep it in
> the final code.
>
>
Sounds good, applied

4. I have modified the proposed commit message as follows, see if that
> makes sense to you, and let me know if I missed anything especially
> the review/author credit.
>
> Allow the use of indexes other than PK and REPLICA IDENTITY on the
> subscriber.
>
> Using REPLICA IDENTITY FULL on the publisher can lead to a full table
> scan per tuple change on the subscription when REPLICA IDENTITY or PK
> index is not available. This makes REPLICA IDENTITY FULL impractical
> to use apart from some small number of use cases.
>
> This patch allows using indexes other than PRIMARY KEY or REPLICA
> IDENTITY on the subscriber during apply of update/delete. The index
> that can be used must be a btree index, not a partial index, and it
> must have at least one column reference (i.e. cannot consist of only
> expressions). We can uplift these restrictions in the future. There is
> no smart mechanism to pick the index. If there is more than one index
> that
> satisfies these requirements, we just pick the first one. We discussed
> using some of the optimizer's low-level APIs for this but ruled it out
> as that can be a maintenance burden in the long run.
>
> This patch improves the performance in the vast majority of cases and
> the improvement is proportional to the amount of data in the table.
> However, there could be some regression in a small number of cases
> where the indexes have a lot of duplicate and dead rows. It was
> discussed that those are mostly impractical cases but we can provide a
> table or subscription level option to disable this feature if
> required.
>
> Author: Onder Kalaci
> Reviewed-by: Peter Smith, Shi yu, Hou Zhijie, Vignesh C, Kuroda
> Hayato, Amit Kapila
> Discussion:
> https://postgr.es/m/CACawEhVLqmAAyPXdHEPv1ssU2c=dqOniiGz7G73HfyS7+nGV4w@mail.gmail.com
>
>
I also see 2 mails/reviews from Wang wei, but I'm not sure what qualifies
as "reviewer" for this group. Should we
add that name as well? I think you can guide us on this.

Other than that, I only fixed one extra new line between 'that' and
"satisfies'. Other than that, it looks pretty good!

Thanks,
Onder

Attachment Content-Type Size
v44_0001_use_index_on_subs_when_pub_rep_ident_full.patch application/octet-stream 47.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zheng Li 2023-03-12 15:24:13 Re: Support logical replication of DDLs
Previous Message Önder Kalacı 2023-03-12 15:07:17 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher