RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>
Cc: Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-02-15 03:53:34
Message-ID: OSZPR01MB63103A4AFBBA56BAF8AE7FAAFDA39@OSZPR01MB6310.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 4, 2023 7:24 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Feb 2, 2023 at 2:03 PM Önder Kalacı <onderkalaci(at)gmail(dot)com> wrote:
> >
> >>
> >> and if there's more
> >> than one candidate index pick any one. Additionally, we can allow
> >> disabling the use of an index scan for this particular case. If we are
> >> too worried about API change for allowing users to specify the index
> >> then we can do that later or as a separate patch.
> >>
> >
> > On v23, I dropped the planner support for picking the index. Instead, it simply
> > iterates over the indexes and picks the first one that is suitable.
> >
> > I'm currently thinking on how to enable users to override this decision.
> > One option I'm leaning towards is to add a syntax like the following:
> >
> > ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
> >
> > Though, that should probably be a seperate patch. I'm going to work
> > on that, but still wanted to share v23 given picking the index sounds
> > complementary, not strictly required at this point.
> >
>
> I agree that it could be a separate patch. However, do you think we
> need some way to disable picking the index scan? This is to avoid
> cases where sequence scan could be better or do we think there won't
> exist such a case?
>

I think such a case exists. I tried the following cases based on v23 patch.

# Step 1.
Create publication, subscription and tables.
-- on publisher
create table tbl (a int);
alter table tbl replica identity full;
create publication pub for table tbl;

-- on subscriber
create table tbl (a int);
create index idx_a on tbl(a);
create subscription sub connection 'dbname=postgres port=5432' publication pub;

# Step 2.
Setup synchronous replication.

# Step 3.
Execute SQL query on publisher.

-- case 1 (All values are duplicated)
truncate tbl;
insert into tbl select 1 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 2
truncate tbl;
insert into tbl select i%3 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 3
truncate tbl;
insert into tbl select i%5 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 4
truncate tbl;
insert into tbl select i%10 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 5
truncate tbl;
insert into tbl select i%100 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 6
truncate tbl;
insert into tbl select i%1000 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 7 (No duplicate value)
truncate tbl;
insert into tbl select i from generate_series(0,10000)i;
update tbl set a=a+1;

# Result
The time executing update (the average of 3 runs is taken, the unit is
milliseconds):

+--------+---------+---------+
| | patched | master |
+--------+---------+---------+
| case 1 | 3933.68 | 1298.32 |
| case 2 | 1803.46 | 1294.42 |
| case 3 | 1380.82 | 1299.90 |
| case 4 | 1042.60 | 1300.20 |
| case 5 | 691.69 | 1297.51 |
| case 6 | 578.50 | 1300.69 |
| case 7 | 566.45 | 1302.17 |
+--------+---------+---------+

In case 1~3, there's an overhead after applying the patch. In other cases, the
patch improved the performance. As more duplicate values, the greater the
overhead after applying the patch.

Regards,
Shi Yu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-02-15 04:00:26 Re: Support logical replication of DDLs
Previous Message Richard Guo 2023-02-15 03:31:44 Wrong query results caused by loss of join quals