Re: row filtering for logical replication

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2022-02-09 07:48:10
Message-ID: CAHut+PvTVTXa+HGK-=78ugAS+c+jQPxEhs2FTdTNJzgi1TuJSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Are there any recent performance evaluations of the overhead of row filters? I
> think it'd be good to get some numbers comparing:
>
> 1) $workload with master
> 2) $workload with patch, but no row filters
> 3) $workload with patch, row filter matching everything
> 4) $workload with patch, row filter matching few rows
>
> For workload I think it'd be worth testing:
> a) bulk COPY/INSERT into one table
> b) Many transactions doing small modifications to one table
> c) Many transactions targetting many different tables
> d) Interspersed DDL + small changes to a table
>

We have collected the performance data results for all the different
workloads [*].

The test strategy is now using pg_recvlogical with steps as Andres
suggested [1].

Note - "Allow 0%" and "Allow 100%" are included as tests cases, but in
practice, a user is unlikely to deliberately use a filter that allows
nothing to pass through it, or allows everything to pass through it.

PSA the bar charts of the results. All other details are below.

~~~~~

RESULTS - workload "a" (v76)
======================
HEAD 18.40
No Filters 18.86
Allow 100% 17.96
Allow 75% 16.39
Allow 50% 14.60
Allow 25% 11.23
Allow 0% 9.41

Observations for "a":
- Using row filters has minimal overhead in the worst case (compare
HEAD versus "Allow 100%")
- As more % data is filtered out (less is replicated) then the times decrease

RESULTS - workload "b" (v76)
======================
HEAD 2.30
No Filters 1.96
Allow 100% 1.99
Allow 75% 1.65
Allow 50% 1.35
Allow 25% 1.17
Allow 0% 0.84

Observations for "b":
- Using row filters has minimal overhead in the worst case (compare
HEAD versus "Allow 100%")
- As more % data is filtered out (less is replicated) then the times decrease

RESULTS - workload "c" (v76)
======================
HEAD 20.40
No Filters 19.85
Allow 100% 20.94
Allow 75% 17.26
Allow 50% 16.13
Allow 25% 13.32
Allow 0% 10.33

Observations for "c":
- Using row filters has minimal overhead in the worst case (compare
HEAD versus "Allow 100%")
- As more % data is filtered out (less is replicated) then the times decrease

RESULTS - workload "d" (v80)
======================
HEAD 6.81
No Filters 6.85
Allow 100% 7.61
Allow 75% 7.80
Allow 50% 6.46
Allow 25% 6.35
Allow 0% 6.46

Observations for "d":
- As more % data is filtered out (less is replicated) then the times
became less than HEAD, but not much.
- Improvements due to row filtering are less noticeable (e.g. HEAD
versus "Allow 0%") for this workload; we attribute this to the fact
that for this script there are fewer rows getting replicated in the
1st place so we are only comparing 1000 x INSERT/UPDATE against 0 x
INSERT/UPDATE.

~~~~~~

Details - workload "a"
=======================

CREATE TABLE test (key int, value text, data jsonb, PRIMARY KEY(key, value));

CREATE PUBLICATION pub_1 FOR TABLE test;
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 0); -- 100% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 250000); -- 75% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 500000); -- 50% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 750000); -- 25% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 1000000); -- 0% allowed

INSERT INTO test SELECT i, i::text, row_to_json(row(i)) FROM
generate_series(1,1000001)i;

Details - workload "b"
======================

CREATE TABLE test (key int, value text, data jsonb, PRIMARY KEY(key, value));

CREATE PUBLICATION pub_1 FOR TABLE test;
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 0); -- 100% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 250000); -- 75% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 500000); -- 50% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 750000); -- 25% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 1000000); -- 0% allowed

DO
$do$
BEGIN
FOR i IN 0..1000001 BY 10 LOOP
INSERT INTO test VALUES(i,'BAH', row_to_json(row(i)));
UPDATE test SET value = 'FOO' WHERE key = i;
IF I % 1000 = 0 THEN
COMMIT;
END IF;
END LOOP;
END
$do$;

Details - workload "c"
======================

CREATE TABLE test1 (key int, value text, data jsonb, PRIMARY KEY(key, value));
CREATE TABLE test2 (key int, value text, data jsonb, PRIMARY KEY(key, value));
CREATE TABLE test3 (key int, value text, data jsonb, PRIMARY KEY(key, value));
CREATE TABLE test4 (key int, value text, data jsonb, PRIMARY KEY(key, value));
CREATE TABLE test5 (key int, value text, data jsonb, PRIMARY KEY(key, value));

CREATE PUBLICATION pub_1 FOR TABLE test1, test2, test3, test4, test5;
CREATE PUBLICATION pub_1 FOR TABLE test1 WHERE (key > 0), test2 WHERE
(key > 0), test3 WHERE (key > 0), test4 WHERE (key > 0), test5 WHERE
(key > 0);
CREATE PUBLICATION pub_1 FOR TABLE test1 WHERE (key > 250000), test2
WHERE (key > 250000), test3 WHERE (key > 250000), test4 WHERE (key >
250000), test5 WHERE (key > 250000);
CREATE PUBLICATION pub_1 FOR TABLE test1 WHERE (key > 500000), test2
WHERE (key > 500000), test3 WHERE (key > 500000), test4 WHERE (key >
500000), test5 WHERE (key > 500000);
CREATE PUBLICATION pub_1 FOR TABLE test1 WHERE (key > 750000), test2
WHERE (key > 750000), test3 WHERE (key > 750000), test4 WHERE (key >
750000), test5 WHERE (key > 750000);
CREATE PUBLICATION pub_1 FOR TABLE test1 WHERE (key > 1000000), test2
WHERE (key > 1000000), test3 WHERE (key > 1000000), test4 WHERE (key >
1000000), test5 WHERE (key > 1000000);

DO
$do$
BEGIN
FOR i IN 0..1000001 BY 10 LOOP
-- test1
INSERT INTO test1 VALUES(i,'BAH', row_to_json(row(i)));
UPDATE test1 SET value = 'FOO' WHERE key = i;
-- test2
INSERT INTO test2 VALUES(i,'BAH', row_to_json(row(i)));
UPDATE test2 SET value = 'FOO' WHERE key = i;
-- test3
INSERT INTO test3 VALUES(i,'BAH', row_to_json(row(i)));
UPDATE test3 SET value = 'FOO' WHERE key = i;
-- test4
INSERT INTO test4 VALUES(i,'BAH', row_to_json(row(i)));
UPDATE test4 SET value = 'FOO' WHERE key = i;
-- test5
INSERT INTO test5 VALUES(i,'BAH', row_to_json(row(i)));
UPDATE test5 SET value = 'FOO' WHERE key = i;

IF I % 1000 = 0 THEN
-- raise notice 'commit: %', i;
COMMIT;
END IF;
END LOOP;
END
$do$;

Details - workload "d"
======================

CREATE TABLE test (key int, value text, data jsonb, PRIMARY KEY(key, value));

CREATE PUBLICATION pub_1 FOR TABLE test;
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 0); -- 100% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 250000); -- 75% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 500000); -- 50% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 750000); -- 25% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 1000000); -- 0% allowed

DO
$do$
BEGIN
FOR i IN 0..1000000 BY 1000 LOOP
ALTER TABLE test ALTER COLUMN value1 TYPE varchar(30);
INSERT INTO test VALUES(i,'BAH','BAH', row_to_json(row(i)));
ALTER TABLE test ALTER COLUMN value1 TYPE text;
UPDATE test SET value = 'FOO' WHERE key = i;
IF I % 10000 = 0 THEN
COMMIT;
END IF;
END LOOP;
END
$do$;

------
[*] This post repeats some results for already sent for workloads
"a","b","c"; this is so the complete set is now all here in one place
[1] https://www.postgresql.org/message-id/20220203182922.344fhhqzjp2ah6yp%40alap3.anarazel.de

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment Content-Type Size
workload-a-v76.PNG image/png 152.2 KB
workload-d-v80.PNG image/png 153.7 KB
workload-b-v76.PNG image/png 158.0 KB
workload-c-v76.PNG image/png 159.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-02-09 07:55:49 Re: Support escape sequence for cluster_name in postgres_fdw.application_name
Previous Message Kyotaro Horiguchi 2022-02-09 07:44:14 Re: Make mesage at end-of-recovery less scary.