Re: row filtering for logical replication

From: Ajin Cherian <itsajin(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2022-02-02 09:16:23
Message-ID: CAFPTHDa1-KrzeHiPEYE=GF_f8AD4HWtFkey-CJDUpE=2p5bVQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Peter,

I just tried scenario b that Andres suggested:

For scenario b, I did some testing with row-filter-patch v74 and
various levels of filtering. 0% replicated to 100% rows replicated.
The times are in seconds, I did 5 runs each.

Results:

RUN HEAD "with patch 0%" "row-filter-patch 25%" "row-filter-patch
v74 50%" "row-filter-patch 75%" "row-filter-patch v74 100%"
1 17.26178 12.573736 12.869635 13.742167
17.977112 17.75814
2 17.522473 12.919554 12.640879 14.202737
14.515481 16.961836
3 17.124001 12.640879 12.706631 14.220245
15.686613 17.219355
4 17.24122 12.602345 12.674566 14.219423
15.564312 17.432765
5 17.25352 12.610657 12.689842 14.210725
15.613708 17.403821

As can see the performance seen on HEAD is similar to that which the
patch achieves with all rows (100%) replication. The performance
improves linearly with
more rows filtered.

The test scenario used was:

1. On publisher and subscriber:
CREATE TABLE test (key int, value text, data jsonb, PRIMARY KEY(key, value));

2. On publisher: (based on which scenario is being tested)
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 0); -- 100% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 250000); -- 75% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 500000); -- 50% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 750000); -- 25% allowed
CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 1000000); -- 0% allowed

3. On the subscriber:
CREATE SUBSCRIPTION sync_sub CONNECTION 'host=127.0.0.1 port=5432
dbname=postgres application_name=sync_sub' PUBLICATION pub_1;

4. now modify the postgresql.conf on the publisher side
synchronous_standby_names = 'sync_sub' and restart.

5. The test case:

DO
$do$
BEGIN
FOR i IN 1..1000001 BY 10 LOOP
INSERT INTO test VALUES(i,'BAH', row_to_json(row(i)));
UPDATE test SET value = 'FOO' WHERE key = i;
IF I % 1000 = 0 THEN
COMMIT;
END IF;
END LOOP;
END
$do$;

regards,
Ajin Cherian
Fujitsu Australia

On Tue, Feb 1, 2022 at 12:07 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Sat, Jan 29, 2022 at 11:31 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > Hi,
> >
> > Are there any recent performance evaluations of the overhead of row filters? I
> > think it'd be good to get some numbers comparing:
> >
> > 1) $workload with master
> > 2) $workload with patch, but no row filters
> > 3) $workload with patch, row filter matching everything
> > 4) $workload with patch, row filter matching few rows
> >
> > For workload I think it'd be worth testing:
> > a) bulk COPY/INSERT into one table
> > b) Many transactions doing small modifications to one table
> > c) Many transactions targetting many different tables
> > d) Interspersed DDL + small changes to a table
> >
>
> I have gathered performance data for the workload case (a):
>
> HEAD 46743.75
> v74 no filters 46929.15
> v74 allow 100% 46926.09
> v74 allow 75% 40617.74
> v74 allow 50% 35744.17
> v74 allow 25% 29468.93
> v74 allow 0% 22540.58
>
> PSA.
>
> This was tested using patch v74 and synchronous pub/sub. There are 1M
> INSERTS for publications using differing amounts of row filtering (or
> none).
>
> Observations:
> - There seems insignificant row-filter overheads (e.g. viz no filter
> and 100% allowed versus HEAD).
> - The elapsed time decreases linearly as there is less data getting replicated.
>
> I will post the results for other workload kinds (b, c, d) when I have them.
>
> ------
> Kind Regards,
> Peter Smith.
> Fujitsu Australia.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2022-02-02 10:13:29 Re: CREATE TABLE ( .. STORAGE ..)
Previous Message Daniel Gustafsson 2022-02-02 09:06:13 Re: Plug minor memleak in pg_dump