Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2022-08-23 16:24:42
Message-ID: CACawEhXr+CYdfr_0dE8F=2Sm3hSFOw5N8wKKPY+CyO7oBoMyAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for the review!

>
> 1.
> In FilterOutNotSuitablePathsForReplIdentFull(), is
> "nonPartialIndexPathList" a
> good name for the list? Indexes on only expressions are also be filtered.
>
> +static List *
> +FilterOutNotSuitablePathsForReplIdentFull(List *pathlist)
> +{
> + ListCell *lc;
> + List *nonPartialIndexPathList = NIL;
>
>
Yes, true. We only started filtering the non-partial ones first. Now
changed to *suitableIndexList*, does that look right?

> 2.
> +typedef struct LogicalRepPartMapEntry
> +{
> + Oid partoid; /*
> LogicalRepPartMap's key */
> + LogicalRepRelMapEntry relmapentry;
> + Oid usableIndexOid; /* which index to use?
> (Invalid when no index
> + * used) */
> +} LogicalRepPartMapEntry;
>
> For partition tables, is it possible to use relmapentry->usableIndexOid to
> mark
> which index to use? Which means we don't need to add "usableIndexOid" to
> LogicalRepPartMapEntry.
>
>
My intention was to make this explicit so that it is clear that partitions
can explicitly own indexes.

But I tried your suggested refactor, which looks good. So, I changed it.

Also, I realized that I do not have a test where the partition has an index
(not inherited from the parent), which I also added now.

> 3.
> It looks we should change the comment for FindReplTupleInLocalRel() in this
> patch.
>
> /*
> * Try to find a tuple received from the publication side (in
> 'remoteslot') in
> * the corresponding local relation using either replica identity index,
> * primary key or if needed, sequential scan.
> *
> * Local tuple, if found, is returned in '*localslot'.
> */
> static bool
> FindReplTupleInLocalRel(EState *estate, Relation localrel,
>
>
I made a small change, just adding "index". Do you expect a larger change?

> 4.
> @@ -2030,16 +2017,19 @@ apply_handle_delete_internal(ApplyExecutionData
> *edata,
> {
> EState *estate = edata->estate;
> Relation localrel = relinfo->ri_RelationDesc;
> - LogicalRepRelation *remoterel = &edata->targetRel->remoterel;
> + LogicalRepRelMapEntry *targetRel = edata->targetRel;
> + LogicalRepRelation *remoterel = &targetRel->remoterel;
> EPQState epqstate;
> TupleTableSlot *localslot;
>
> Do we need this change? I didn't see any place to use the variable
> targetRel
> afterwards.
>

Seems so, changed it back.

> 5.
> + if (!AttributeNumberIsValid(mainattno))
> + {
> + /*
> + * There are two cases to consider. First, if the
> index is a primary or
> + * unique key, we cannot have any indexes with
> expressions. So, at this
> + * point we are sure that the index we deal is not
> these.
> + */
> + Assert(RelationGetReplicaIndex(rel) !=
> RelationGetRelid(idxrel) &&
> + RelationGetPrimaryKeyIndex(rel) !=
> RelationGetRelid(idxrel));
> +
> + /*
> + * For a non-primary/unique index with an
> expression, we are sure that
> + * the expression cannot be used for replication
> index search. The
> + * reason is that we create relevant index paths
> by providing column
> + * equalities. And, the planner does not pick
> expression indexes via
> + * column equality restrictions in the query.
> + */
> + continue;
> + }
>
> Is it possible that it is a usable index with an expression? I think
> indexes
> with an expression has been filtered in
> FilterOutNotSuitablePathsForReplIdentFull(). If it can't be a usable index
> with
> an expression, maybe we shouldn't use "continue" here.
>

Ok, I think there are some confusing comments in the code, which I updated.
Also, added one more explicit Assert to make the code a little more
readable.

We can support indexes involving expressions but not indexes that are only
consisting of expressions. FilterOutNotSuitablePathsForReplIdentFull()
filters out the latter, see IndexOnlyOnExpression().

So, for example, if we have an index as below, we are skipping the
expression while building the index scan keys:

CREATE INDEX people_names ON people (firstname, lastname, (id || '_' ||
sub_id));

We can consider removing `continue`, but that'd mean we should also adjust
the following code-block to handle indexprs. To me, that seems like an edge
case to implement at this point, given such an index is probably not
common. Do you think should I try to use the indexprs as well while
building the scan key?

I'm mostly trying to keep the complexity small. If you suggest this
limitation should be lifted, I can give it a shot. I think the limitation I
leave here is with a single sentence: *The index on the subscriber can only
use simple column references. *

> 6.
> In the following case, I got a result which is different from HEAD, could
> you
> please look into it?
>
> -- publisher
> CREATE TABLE test_replica_id_full (x int);
> ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
> CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
>
> -- subscriber
> CREATE TABLE test_replica_id_full (x int, y int);
> CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x,y);
> CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres
> port=5432' PUBLICATION tap_pub_rep_full;
>
> -- publisher
> INSERT INTO test_replica_id_full VALUES (1);
> UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;
>
> The data in subscriber:
> on HEAD:
> postgres=# select * from test_replica_id_full ;
> x | y
> ---+---
> 2 |
> (1 row)
>
> After applying the patch:
> postgres=# select * from test_replica_id_full ;
> x | y
> ---+---
> 1 |
> (1 row)
>
>
Ops, good catch. it seems we forgot to have:

skey[scankey_attoff].sk_flags |= SK_SEARCHNULL;

On head, the index used for this purpose could only be the primary key or
unique key on NOT NULL columns. Now, we do allow NULL values, and need to
search for them. Added that (and your test) to the updated patch.

As a semi-related note, tuples_equal() decides `true` for (NULL = NULL). I
have not changed that, and it seems right in this context. Do you see any
issues with that?

Also, I realized that the functions in the execReplication.c expect only
btree indexes. So, I skipped others as well. If that makes sense, I can
work on a follow-up patch after we can merge this, to remove some of the
limitations mentioned here.

Thanks,
Onder

Attachment Content-Type Size
v8_0001_use_index_on_subs_when_pub_rep_ident_full.patch application/x-patch 71.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-08-23 16:26:55 Re: SQL/JSON features for v15
Previous Message Pavel Stehule 2022-08-23 16:06:22 Re: SQL/JSON features for v15