Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2022-10-14 16:04:02
Message-ID: CACawEhWdoiaXwKjn5ZfAKF_Toz=t1aqT1ewHmJLRnXmKOQKwyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for the review!

Here are some comments on v17 patch.
>
> 1.
> -LogicalRepRelMapEntry *
> +LogicalRepPartMapEntry *
> logicalrep_partition_open(LogicalRepRelMapEntry *root,
> Relation partrel,
> AttrMap *map)
> {
>
> Is there any reason to change the return type of
> logicalrep_partition_open()? It
> seems ok without this change.
>

I think you are right, I probably needed that in some of my
earlier iterations of the patch, but now it seems redundant. Reverted back
to the original version.

>
> 2.
>
> + * of the relation cache entry (e.g., such as ANALYZE or
> + * CREATE/DROP index on the relation).
>
> "e.g." and "such as" mean the same. I think we remove one of them.
>

fixed

>
> 3.
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select (idx_scan = 2) from pg_stat_all_indexes where
> indexrelname = 'test_replica_id_full_idx';}
> +) or die "Timed out while waiting for'check subscriber tap_sub_rep_full
> deletes one row via index";
> +
>
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select (idx_scan = 1) from pg_stat_all_indexes where
> indexrelname = 'test_replica_id_full_idy';}
> +) or die "Timed out while waiting for'check subscriber tap_sub_rep_full
> deletes one row via index";
>
>
> "for'check" -> "for check"
>

fixed

>
> 3.
> +$node_subscriber->safe_psql('postgres',
> + "SELECT pg_reload_conf();");
> +
> +# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN
> +# ====================================================================
> +
> +$node_subscriber->stop('fast');
> +$node_publisher->stop('fast');
> +
>
> "Testcase start" in the comment should be "Testcase end".
>
>
fixed

> 4.
> There seems to be a problem in the following scenario, which results in
> inconsistent data between publisher and subscriber.
>
> -- publisher
> CREATE TABLE test_replica_id_full (x int, y int);
> ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
> CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
>
> -- subscriber
> CREATE TABLE test_replica_id_full (x int, y int);
> CREATE UNIQUE INDEX test_replica_id_full_idx ON test_replica_id_full(x);
> CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres
> port=5432' PUBLICATION tap_pub_rep_full;
>
> -- publisher
> INSERT INTO test_replica_id_full VALUES (NULL,1);
> INSERT INTO test_replica_id_full VALUES (NULL,2);
> INSERT INTO test_replica_id_full VALUES (NULL,3);
> update test_replica_id_full SET x=1 where y=2;
>
> The data in publisher:
> postgres=# select * from test_replica_id_full order by y;
> x | y
> ---+---
> | 1
> 1 | 2
> | 3
> (3 rows)
>
> The data in subscriber:
> postgres=# select * from test_replica_id_full order by y;
> x | y
> ---+---
> | 2
> 1 | 2
> | 3
> (3 rows)
>
> There is no such problem on master branch.
>
>
Uff, the second problem reported regarding NULL values for this patch (both
by you). First, v18 contains the fix for the problem. It turns out that my
idea of treating all unique indexes (pkey, replica identity and unique
regular indexes) the same proved to be wrong. The former two require all
the involved columns to have NOT NULL. The latter not.

This resulted in RelationFindReplTupleByIndex() to skip tuples_equal() for
regular unique indexes (e.g., non pkey/replid). Hence, the first NULL value
is considered the matching tuple. Instead, we should be doing a full tuple
equality check (e.g., tuples_equal). This is what v18 does. Also, add the
above scenario as a test.

I think we can probably skip tuples_equal() for unique indexes that consist
of only NOT NULL columns. However, that seems like an over-optimization. If
you have such a unique index, why not create a primary key anyway? That's
why I don't see much value in compicating the code for that use case.

Thanks for the review & testing. I'll focus more on the NULL values on my
own testing as well. Still, I wanted to push my changes so that you can
also have a look if possible.

Attach v18.

Onder KALACI

Attachment Content-Type Size
v18_0001_use_index_on_subs_when_pub_rep_ident_full.patch application/octet-stream 87.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2022-10-14 16:31:15 Add regular expression testing for user name mapping in the peer authentication TAP test
Previous Message Karina Litskevich 2022-10-14 15:16:56 Re: Error for WITH options on partitioned tables