RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: Önder Kalacı <onderkalaci(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2022-10-18 06:46:05
Message-ID: OS3PR01MB62754E945BD95A9B2A1164789E289@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 23, 2022 at 0:14 AM Önder Kalacı <onderkalaci(at)gmail(dot)com> wrote:
> Hii Wang wei,

Thanks for updating the patch and your reply.

> > 1. In the function GetCheapestReplicaIdentityFullPath.
> > + if (rel->pathlist == NIL)
> > + {
> > + /*
> > + * A sequential scan could have been dominated by by an index
> > scan
> > + * during make_one_rel(). We should always have a sequential
> > scan
> > + * before set_cheapest().
> > + */
> > + Path *seqScanPath = create_seqscan_path(root, rel, NULL,
> > 0);
> > +
> > + add_path(rel, seqScanPath);
> > + }
> >
> > This is a question I'm not sure about:
> > Do we need this part to add sequential scan?
> >
> > I think in our case, the sequential scan seems to have been added by the
> > function make_one_rel (see function set_plain_rel_pathlist).
>
> Yes, the sequential scan is added during make_one_rel.
>
> > If I am missing something, please let me know. BTW, there is a typo in
> > above comment: `by by`.
>
> As the comment mentions, the sequential scan could have been dominated &
> removed by index scan, see add_path():
>
> *We also remove from the rel's pathlist any old paths that are dominated
> * by new_path --- that is, new_path is cheaper, at least as well ordered,
> * generates no more rows, requires no outer rels not required by the old
> * path, and is no less parallel-safe.
>
> Still, I agree that the comment could be improved, which I pushed.

Oh, sorry I didn't realize this part of the logic. Thanks for sharing this.

And I have another confusion about function GetCheapestReplicaIdentityFullPath:
If rel->pathlist is NIL, could we return NULL directly from this function, and
then set idxoid to InvalidOid in function FindUsableIndexForReplicaIdentityFull
in that case?

===

Here are some comments for test file 032_subscribe_use_index.pl on v18 patch:

1.
```
+# Basic test where the subscriber uses index
+# and only updates 1 row for and deletes
+# 1 other row
```
There seems to be an extra "for" here.

2. Typos for subscription name in the error messages.
tap_sub_rep_full_0 -> tap_sub_rep_full

3. Typo in comments
```
+# use the newly created index (provided that it fullfils the requirements).
```
fullfils -> fulfils

4. Some extra single quotes at the end of the error message ('").
For example:
```
# wait until the index is used on the subscriber
$node_subscriber->poll_query_until(
'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates 200 rows via index'";
```

5. The column names in the error message appear to be a typo.
```
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on high cardinality column-1";
...
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on high cardinality column-3";
...
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on high cardinality column-4";
```
It seems that we need to do the following change: 'column-3' -> 'column-1' and
'column-4' -> 'column-2'.
Or we could use the column names directly like this: 'column-1' -> 'column a',
'column_3' -> 'column a' and 'column_4' -> 'column b'.

6. DELETE action is missing from the error message.
```
+# 2 rows from first command, another 2 from the second command
+# overall index_on_child_1_a is used 4 times
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=4 from pg_stat_all_indexes where indexrelname = 'index_on_child_1_a';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates child_1 table'";
```
I think we execute both UPDATE and DELETE for child_1 here. Could we add DELETE
action to this error message?

7. Table name in the error message.
```
# check if the index is used even when the index has NULL values
$node_subscriber->poll_query_until(
'postgres', q{select idx_scan=2 from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates parent table'";
```
It seems to be "test_replica_id_full" here instead of "parent'".

Regards,
Wang wei

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2022-10-18 07:14:21 Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Previous Message Bharath Rupireddy 2022-10-18 06:31:07 Simplify standby state machine a bit in WaitForWALToBecomeAvailable()