Re: row filtering for logical replication

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2022-02-03 00:31:00
Message-ID: CAHut+PsBXHXoQviWuayvcDUX4LKhVBn_OkL1wz7RPW3XEk9Vag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 2, 2022 at 8:16 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> Hi Peter,
>
> I just tried scenario b that Andres suggested:
>
> For scenario b, I did some testing with row-filter-patch v74 and
> various levels of filtering. 0% replicated to 100% rows replicated.
> The times are in seconds, I did 5 runs each.
>
> Results:
>
> RUN HEAD "with patch 0%" "row-filter-patch 25%" "row-filter-patch
> v74 50%" "row-filter-patch 75%" "row-filter-patch v74 100%"
> 1 17.26178 12.573736 12.869635 13.742167
> 17.977112 17.75814
> 2 17.522473 12.919554 12.640879 14.202737
> 14.515481 16.961836
> 3 17.124001 12.640879 12.706631 14.220245
> 15.686613 17.219355
> 4 17.24122 12.602345 12.674566 14.219423
> 15.564312 17.432765
> 5 17.25352 12.610657 12.689842 14.210725
> 15.613708 17.403821
>
> As can see the performance seen on HEAD is similar to that which the
> patch achieves with all rows (100%) replication. The performance
> improves linearly with
> more rows filtered.
>
> The test scenario used was:
>
> 1. On publisher and subscriber:
> CREATE TABLE test (key int, value text, data jsonb, PRIMARY KEY(key, value));
>
> 2. On publisher: (based on which scenario is being tested)
> CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 0); -- 100% allowed
> CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 250000); -- 75% allowed
> CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 500000); -- 50% allowed
> CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 750000); -- 25% allowed
> CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 1000000); -- 0% allowed
>
> 3. On the subscriber:
> CREATE SUBSCRIPTION sync_sub CONNECTION 'host=127.0.0.1 port=5432
> dbname=postgres application_name=sync_sub' PUBLICATION pub_1;
>
> 4. now modify the postgresql.conf on the publisher side
> synchronous_standby_names = 'sync_sub' and restart.
>
> 5. The test case:
>
> DO
> $do$
> BEGIN
> FOR i IN 1..1000001 BY 10 LOOP
> INSERT INTO test VALUES(i,'BAH', row_to_json(row(i)));
> UPDATE test SET value = 'FOO' WHERE key = i;
> IF I % 1000 = 0 THEN
> COMMIT;
> END IF;
> END LOOP;
> END
> $do$;
>
>

Thanks!

I have put your results as a bar chart same as for the previous workload case:

HEAD 17.25
v74 no filters NA
v74 allow 100% 17.35
v74 allow 75% 15.62
v74 allow 50% 14.21
v74 allow 25% 12.69
v74 allow 0% 12.62

PSA.

------
Kind Regards,
Peter Smith.
Fujitsu Australia.

Attachment Content-Type Size
workload-b.PNG image/png 160.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2022-02-03 00:35:36 Unclear problem reports
Previous Message Bruce Momjian 2022-02-02 23:02:39 Re: Replace pg_controldata output fields with macros for better code manageability