Quick Links

RE: row filtering for logical replication

From:	"tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	RE: row filtering for logical replication
Date:	2022-02-14 10:30:19
Message-ID:	OS0PR01MB61138C46049C3534C878B79BFB339@OS0PR01MB6113.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Saturday, January 29, 2022 9:31 AM, From: Andres Freund <andres(at)anarazel(dot)de>
> Hi,
>
> Are there any recent performance evaluations of the overhead of row filters?
> I
> think it'd be good to get some numbers comparing:
>
> 1) $workload with master
> 2) $workload with patch, but no row filters
> 3) $workload with patch, row filter matching everything
> 4) $workload with patch, row filter matching few rows
>
> For workload I think it'd be worth testing:
> a) bulk COPY/INSERT into one table
> b) Many transactions doing small modifications to one table
> c) Many transactions targetting many different tables
> d) Interspersed DDL + small changes to a table

I did the performance test for this patch in two ways:
(1) using pg_recvlogical
(2) using synchronous pub/sub

The results are as below, also attach the bar charts and the details.

Note that the result of performance test using pg_recvlogical is based on v80,
and the one using synchronous pub/sub is based on v81. (I think v80 should have
the same performance as V81 because V81 only fix some test related code compared
with V80)

(1) Using pg_recvlogical

RESULTS - workload "a"
-----------------------------
HEAD 4.350
No Filters 4.413
Allow 100% 4.463
Allow 75% 4.079
Allow 50% 3.765
Allow 25% 3.415
Allow 0% 3.104

RESULTS - workload "b"
-----------------------------
HEAD 0.568
No Filters 0.569
Allow 100% 0.590
Allow 75% 0.510
Allow 50% 0.441
Allow 25% 0.370
Allow 0% 0.302

RESULTS - workload "c"
-----------------------------
HEAD 2.752
No Filters 2.812
Allow 100% 2.846
Allow 75% 2.506
Allow 50% 2.147
Allow 25% 1.806
Allow 0% 1.448

RESULTS - workload "d"
-----------------------------
HEAD 5.612
No Filters 5.645
Allow 100% 5.696
Allow 75% 5.648
Allow 50% 5.532
Allow 25% 5.379
Allow 0% 5.196

Summary of tests:
(a) As more data is filtered out, less time is spend.
(b) The case where no rows are filtered (worst case), there is a overhead of
1-4%. This should be okay as normally nobody will set up filters which doesn't
filter any rows.
(c) There is slight difference in HEAD and No filter (0-2%) case but some of
that could also be attributed to run-to-run variation because in some runs no
filter patch was taking lesser time and in other cases HEAD is taking lesser
time.

(2) Using synchronous pub/sub

RESULTS - workload "a"
-----------------------------
HEAD 9.671
No Filters 9.727
Allow 100% 10.336
Allow 75% 8.544
Allow 50% 7.598
Allow 25% 5.988
Allow 0% 4.542

RESULTS - workload "b"
-----------------------------
HEAD 53.869
No Filters 53.531
Allow 100% 52.679
Allow 75% 39.782
Allow 50% 26.563
Allow 25% 13.506
Allow 0% 0.296

RESULTS - workload "c"
-----------------------------
HEAD 52.378
No Filters 52.432
Allow 100% 51.974
Allow 75% 39.452
Allow 50% 26.604
Allow 25% 13.944
Allow 0% 1.194

RESULTS - workload "d"
-----------------------------
HEAD 57.457
No Filters 57.385
Allow 100% 57.608
Allow 75% 43.575
Allow 50% 29.689
Allow 25% 15.786
Allow 0% 2.879

Summary of tests:
(a) As more data is filtered out, less time is spend.
(b) The case where no rows are filtered (worst case).
There is a overhead in scenario a (bulk INSERT). This should be okay as normally
nobody will set up filters which doesn't filter any rows.
In other scenarios (doing small modifications to one table, targeting many
different tables, and Interspersed DDL + small changes to a table), there is
almost no overhead.
(c) There is almost no time difference in HEAD and No filter.

Regards,
Tang

Attachment	Content-Type	Size
	image/png	47.7 KB
	image/png	47.4 KB
	image/png	46.6 KB
performance_test_using_pg_recvlogical.txt	text/plain	4.5 KB
performance_test_using_sync_replication.txt	text/plain	6.0 KB
	image/png	48.3 KB
	image/png	47.9 KB
	image/png	46.4 KB
	image/png	47.1 KB
	image/png	47.1 KB

In response to

Re: row filtering for logical replication at 2022-01-29 00:31:10 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alexander Pyhalov	2022-02-14 10:53:35	postgres_fdw and skip locked
Previous Message	Andrey V. Lepikhov	2022-02-14 10:22:30	Re: Merging statistics from children instead of re-sampling everything