Re: row filtering for logical replication

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2022-02-03 18:29:22
Message-ID: 20220203182922.344fhhqzjp2ah6yp@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-02-01 13:31:36 +1100, Peter Smith wrote:
> TEST STEPS - Workload case a
>
> 1. Run initdb pub and sub and start both postgres instances (use the nosync postgresql.conf)
>
> 2. Run psql for both instances and create tables
> CREATE TABLE test (key int, value text, data jsonb, PRIMARY KEY(key, value));
>
> 3. create the PUBLISHER on pub instance (e.g. choose from below depending on filter)
> CREATE PUBLICATION pub_1 FOR TABLE test; -- 100% (no filter)
> CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 0); -- 100% allowed
> CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 250000); -- 75% allowed
> CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 500000); -- 50% allowed
> CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 750000); -- 25% allowed
> CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key > 1000000); -- 0% allowed
>
> 4. create the SUBSCRIBER on sub instance
> CREATE SUBSCRIPTION sync_sub CONNECTION 'host=127.0.0.1 port=5432 dbname=postgres application_name=sync_sub' PUBLICATION pub_1;
>
> 5. On pub side modify the postgresql.conf on the publisher side and restart
> \q quite psql
> edit synchronous_standby_names = 'sync_sub'
> restart the pub instance
>
> 6. Run psql (pub side) and perform the test run.
> \timing
> INSERT INTO test SELECT i, i::text, row_to_json(row(i)) FROM generate_series(1,1000001)i;
> select count(*) from test;
> TRUNCATE test;
> select count(*) from test;
> repeat 6 for each test run.

I think think using syncrep as the mechanism for benchmarking the decoding
side makes the picture less clear than it could be - you're measuring a lot of
things other than the decoding. E.g. the overhead of applying those changes. I
think it'd be more accurate to do something like:

/* create publications, table, etc */

-- create a slot from before the changes
SELECT pg_create_logical_replication_slot('origin', 'pgoutput');

/* the changes you're going to measure */

-- save end LSN
SELECT pg_current_wal_lsn();

-- create a slot for pg_recvlogical to consume
SELECT * FROM pg_copy_logical_replication_slot('origin', 'consume');

-- benchmark, endpos is from pg_current_wal_lsn() above
time pg_recvlogical -S consume --endpos 0/2413A720 --start -o proto_version=3 -o publication_names=pub_1 -f /dev/null -d postgres

-- clean up
SELECT pg_drop_replication_slot('consume');

Then repeat this with the different publications and compare the time taken
for the pg_recvlogical. That way the WAL is exactly the same, there is no
overhead of actually doing anything with the data on the other side, etc.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2022-02-03 18:34:28 Re: do only critical work during single-user vacuum?
Previous Message Robert Haas 2022-02-03 18:10:57 Re: Server-side base backup: why superuser, not pg_write_server_files?