RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
To: Önder Kalacı <onderkalaci(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-03-08 02:43:56
Message-ID: OSZPR01MB63105E246E96C5ECF8131C09FDB49@OSZPR01MB6310.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 7, 2023 9:47 PM Önder Kalacı <onderkalaci(at)gmail(dot)com> wrote:
>
> I'm attaching v35.
>

I noticed that if the index column only exists on the subscriber side, this index
can also be chosen. This seems a bit odd because the index column isn't sent
from publisher.

e.g.
-- pub
CREATE TABLE test_replica_id_full (x int, y int);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
-- sub
CREATE TABLE test_replica_id_full (x int, y int, z int);
CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(z);
CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres port=5432' PUBLICATION tap_pub_rep_full;

I didn't see in any cases the behavior changed after applying the patch, which
looks good. Besides, I tested the performance for such case.

Steps:
1. create tables, index, publication, and subscription
-- pub
create table tbl (a int);
alter table tbl replica identity full;
create publication pub for table tbl;
-- sub
create table tbl (a int, b int);
create index idx_b on tbl(b);
create subscription sub connection 'dbname=postgres port=5432' publication pub;
2. setup synchronous replication
3. execute SQL:
truncate tbl;
insert into tbl select i from generate_series(0,10000)i;
update tbl set a=a+1;

The time of UPDATE (take the average of 10 runs):
master: 1356.06 ms
patched: 3968.14 ms

For the cases that all values of extra columns on the subscriber are NULL, index
scan can't do better than sequential scan. This is not a real scenario and I
think it only degrades when there are many NULL values in the index column, so
this is probably not a case to worry about. I just share this case and then we
can discuss should we pick the index which only contain the extra columns on the
subscriber.

Regards,
Shi Yu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message wangw.fnst@fujitsu.com 2023-03-08 02:54:26 RE: Rework LogicalOutputPluginWriterUpdateProgress
Previous Message David G. Johnston 2023-03-08 02:31:42 Re: psql: Add role's membership options to the \du+ command