Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-03-02 09:30:20
Message-ID: CACawEhUQjxYU=UAamdnfaJU0uGJFXzhxrOT-km343kZYpeCGfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

>
>
> Few comments:
> ===============
> 1.
> + identity. When replica identity <literal>FULL</literal> is specified,
> + indexes can be used on the subscriber side for searching the rows.
> These
> + indexes should be btree,
>
> Why only btree and not others like a hash index? Also, there should be
> some comments in FindUsableIndexForReplicaIdentityFull() to explain
> the choices.
>

I updated the comment(s).

For a more technical reference, we have these restrictions, because we rely
on
RelationFindReplTupleByIndex() which is designed to handle PK/RI. And,
RelationFindReplTupleByIndex() is written in a way that only expects
indexes with these limitations.

In order to keep the changes as small as possible, I refrained from
relaxing this
limitation for now. I'm definitely up to working on this for relaxing these
limitations, and practically allowing more cases for non-unique indexes.

>
> 2.
> - * This is not generic routine, it expects the idxrel to be replication
> - * identity of a rel and meet all limitations associated with that.
> + * This is not a generic routine - it expects the idxrel to be an index
> + * that planner would choose if the searchslot includes all the columns
> + * (e.g., REPLICA IDENTITY FULL on the source).
> */
> -static bool
> +static int
>
> This comment is not clear to me. Which change here makes the
> expectation like that? Which planner function/functionality are you
> referring to here?
>

Ops, planner related comments are definitely stale. As you might
remember, in the earlier iterations of this patch, we had some
planner functions to pick indexes for us.

Anyway, I think even for that version of the patch, this comment was
wrong. Updated now, does that look better?

>
> 3.
> +/*
> + * Given a relation and OID of an index, returns true if
> + * the index is relation's primary key's index or
> + * relation's replica identity index.
>
> It seems the line length is a bit off in the above comments. There
> could be a similar mismatch in other places. You might want to run
> pgindent.
>

Makes sense, run pgindent. But it didn't fix this specific instance
automatically,
I changed that manually.

>
> 4.
> +}
> +
> +
> +/*
> + * Returns an index oid if there is an index that can be used
>
> Spurious empty line.
>

fixed

>
> 5.
> - /*
> - * We are in error mode so it's fine this is somewhat slow. It's better to
> - * give user correct error.
> - */
> - if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))
> + /* Give user more precise error if possible. */
> + if (OidIsValid(rel->usableIndexOid))
> {
> ereport(ERROR,
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>
> Is this change valid? I mean this could lead to the error "publisher
> did not send replica identity column expected by the logical
> replication target relation" when it should have given an error:
> "logical replication target relation \"%s.%s\" has neither REPLICA
> IDENTITY index nor PRIMARY ...
>
>
Hmm, that's right, we'd get a wrong error message.

I spent quite a bit of time trying to understand whether we'd
need anything additional after this patch regarding this function,
but my current understanding is that we should just leave the
check as-is. It is mainly because when REPLICA IDENTITY is
full, there is no need to check anything further (other than the
check that is at the bottom of the function)

Attached are both patches: the main patch, and the patch that
optionally disables the index scans. Let's discuss the necessity
for the second patch in the lights of the data we collect with
some more tests.

Thanks,
Onder

Attachment Content-Type Size
v26_0001_use_index_on_subs_when_pub_rep_ident_full.patch application/octet-stream 64.4 KB
v26_0001_optionally_disable_index.patch application/octet-stream 51.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-03-02 09:37:27 Re: Move defaults toward ICU in 16?
Previous Message Drouvot, Bertrand 2023-03-02 09:20:09 Re: Minimal logical decoding on standbys