Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2022-09-19 15:31:55
Message-ID: CACawEhXgSkPQoivkVSb8LYTr+e8j+t9rO2V_tPWH5K-e7zdrBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Hayato Kuroda,

Thanks for the review, please see my reply below:

> ===
> For execRelation.c
>
> 01. RelationFindReplTupleByIndex()
>
> ```
> /* Start an index scan. */
> InitDirtySnapshot(snap);
> - scan = index_beginscan(rel, idxrel, &snap,
> -
> IndexRelationGetNumberOfKeyAttributes(idxrel),
> - 0);
>
> /* Build scan key. */
> - build_replindex_scan_key(skey, rel, idxrel, searchslot);
> + scankey_attoff = build_replindex_scan_key(skey, rel, idxrel,
> searchslot);
>
> + scan = index_beginscan(rel, idxrel, &snap, scankey_attoff, 0);
> ```
>
> I think "/* Start an index scan. */" should be just above
> index_beginscan().
>

moved there

>
> ===
> For worker.c
>
> 02. sable_indexoid_internal()
>
> ```
> + * Note that if the corresponding relmapentry has InvalidOid
> usableIndexOid,
> + * the function returns InvalidOid.
> + */
> +static Oid
> +usable_indexoid_internal(ApplyExecutionData *edata, ResultRelInfo
> *relinfo)
> ```
>
> "InvalidOid usableIndexOid" should be "invalid usableIndexOid,"
>

makes sense, updated

>
> 03. check_relation_updatable()
>
> ```
> * We are in error mode so it's fine this is somewhat slow. It's
> better to
> * give user correct error.
> */
> - if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))
> + if (OidIsValid(rel->usableIndexOid))
> {
> ```
>
> Shouldn't we change the above comment to? The check is no longer slow.
>

Hmm, I couldn't realize this comment earlier. So you suggest "slow" here
refers to the additional function call "GetRelationIdentityOrPK"? If so,
yes I'll update that.

>
> ===
> For relation.c
>
> 04. GetCheapestReplicaIdentityFullPath()
>
> ```
> +static Path *
> +GetCheapestReplicaIdentityFullPath(Relation localrel)
> +{
> + PlannerInfo *root;
> + Query *query;
> + PlannerGlobal *glob;
> + RangeTblEntry *rte;
> + RelOptInfo *rel;
> + int attno;
> + RangeTblRef *rt;
> + List *joinList;
> + Path *seqScanPath;
> ```
>
> I think the part that constructs dummy-planner state can be move to
> another function
> because that part is not meaningful for this.
> Especially line 824-846 can.
>
>
Makes sense, simplified the function. Though, it is always hard to pick
good names for these kinds of helper functions. I
picked GenerateDummySelectPlannerInfoForRelation(), does that sound good to
you as well?

>
> ===
> For 032_subscribe_use_index.pl
>
> 05. general
>
> ```
> +# insert some initial data within the range 0-1000
> +$node_publisher->safe_psql('postgres',
> + "INSERT INTO test_replica_id_full SELECT i%20 FROM
> generate_series(0,1000)i;"
> +);
> ```
>
> It seems that the range of initial data seems [0, 19].
> Same mistake-candidates are found many place.
>

Ah, several copy & paste errors. Fixed (hopefully) all.

>
> 06. general
>
> ```
> +# updates 1000 rows
> +$node_publisher->safe_psql('postgres',
> + "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 15;");
> ```
>
> Only 50 tuples are modified here.
> Same mistake-candidates are found many place.
>

Alright, yes there were several wrong comments in the tests. I went over
the tests once more to fix those and improve comments.

>
> 07. general
>
> ```
> +# we check if the index is used or not
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes
> where indexrelname = 'test_replica_id_full_idx';}
> +) or die "Timed out while waiting for check subscriber tap_sub_rep_full_3
> updates 200 rows via index";
> ```
> The query will be executed until the index scan is finished, but it may be
> not commented.
> How about changing it to "we wait until the index used on the
> subscriber-side." or something?
> Same comments are found in many place.
>

Makes sense, updated

>
> 08. test related with ANALYZE
>
> ```
> +# Testcase start: SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE
> - PARTITIONED TABLE
> +# ====================================================================
> ```
>
> "Testcase start:" should be "Testcase end:" here.
>

thanks, fixed

>
> 09. general
>
> In some tests results are confirmed but in other test they are not.
> I think you can make sure results are expected in any case if there are no
> particular reasons.
>
>
Alright, yes I also don't see a reason not to do that. Added to all cases.

I'll attach the patch with the next email as I also want to incorporate the
other comments. Hope this is not going to be confusing.

Thanks,
Onder

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Önder Kalacı 2022-09-19 15:32:03 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Previous Message Matthias van de Meent 2022-09-19 15:29:14 Re: Pruning never visible changes