Re: row filtering for logical replication

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-12-22 03:52:52
Message-ID: CALDaNm2gOVeXnEr_NXveP4rA=DbZ7kV5E6X-Jh5Jn4O5eXOxWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 21, 2021 at 2:29 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here is the v51* patch set:
>

I tweaked the query slightly based on Euler's changes, the explain
analyze of the updated query based on Euler's suggestions, existing
query and Euler's query is given below:
1) updated query based on Euler's suggestion:
explain analyze SELECT DISTINCT pg_get_expr(prqual, prrelid) FROM
pg_publication p INNER JOIN pg_publication_rel pr ON (p.oid =
pr.prpubid) WHERE pr.prrelid = 16384 AND p.pubname IN ( 'pub1' )
AND NOT (select bool_or(puballtables) FROM pg_publication
WHERE pubname in ( 'pub1' )) AND NOT EXISTS (SELECT 1 FROM
pg_publication_namespace pn, pg_class c WHERE c.oid = 16384 AND
c.relnamespace = pn.pnnspid);
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=14.68..14.69 rows=1 width=32) (actual time=0.121..0.126
rows=1 loops=1)
InitPlan 1 (returns $0)
-> Aggregate (cost=1.96..1.97 rows=1 width=1) (actual
time=0.025..0.026 rows=1 loops=1)
-> Seq Scan on pg_publication (cost=0.00..1.96 rows=1
width=1) (actual time=0.016..0.017 rows=1 loops=1)
Filter: (pubname = 'pub1'::name)
InitPlan 2 (returns $1)
-> Nested Loop (cost=0.27..8.30 rows=1 width=0) (actual
time=0.002..0.003 rows=0 loops=1)
Join Filter: (pn.pnnspid = c.relnamespace)
-> Seq Scan on pg_publication_namespace pn
(cost=0.00..0.00 rows=1 width=4) (actual time=0.001..0.002 rows=0
loops=1)
-> Index Scan using pg_class_oid_index on pg_class c
(cost=0.27..8.29 rows=1 width=4) (never executed)
Index Cond: (oid = '16384'::oid)
-> Sort (cost=4.40..4.41 rows=1 width=32) (actual
time=0.119..0.121 rows=1 loops=1)
Sort Key: (pg_get_expr(pr.prqual, pr.prrelid)) COLLATE "C"
Sort Method: quicksort Memory: 25kB
-> Result (cost=0.00..4.39 rows=1 width=32) (actual
time=0.094..0.098 rows=1 loops=1)
One-Time Filter: ((NOT $0) AND (NOT $1))
-> Nested Loop (cost=0.00..4.39 rows=1 width=36)
(actual time=0.013..0.015 rows=1 loops=1)
Join Filter: (p.oid = pr.prpubid)
-> Seq Scan on pg_publication p
(cost=0.00..1.96 rows=1 width=4) (actual time=0.004..0.005 rows=1
loops=1)
Filter: (pubname = 'pub1'::name)
-> Seq Scan on pg_publication_rel pr
(cost=0.00..2.41 rows=1 width=40) (actual time=0.005..0.005 rows=1
loops=1)
Filter: (prrelid = '16384'::oid)
Planning Time: 1.014 ms
Execution Time: 0.259 ms
(24 rows)

2) Existing query:
postgres=# explain analyze SELECT DISTINCT pg_get_expr(prqual,
prrelid) FROM pg_publication p
INNER JOIN pg_publication_rel pr ON (p.oid = pr.prpubid) WHERE
pr.prrelid = 16384 AND p.pubname IN ( 'pub1' )
AND NOT (select bool_or(puballtables) FROM pg_publication
WHERE pubname in ( 'pub1' ))
AND (SELECT count(1)=0 FROM pg_publication_namespace pn,
pg_class c WHERE c.oid = 16384 AND c.relnamespace = pn.pnnspid);
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=14.69..14.70 rows=1 width=32) (actual time=0.162..0.166
rows=1 loops=1)
InitPlan 1 (returns $0)
-> Aggregate (cost=1.96..1.97 rows=1 width=1) (actual
time=0.023..0.025 rows=1 loops=1)
-> Seq Scan on pg_publication (cost=0.00..1.96 rows=1
width=1) (actual time=0.014..0.016 rows=1 loops=1)
Filter: (pubname = 'pub1'::name)
InitPlan 2 (returns $1)
-> Aggregate (cost=8.30..8.32 rows=1 width=1) (actual
time=0.044..0.045 rows=1 loops=1)
-> Nested Loop (cost=0.27..8.30 rows=1 width=0) (actual
time=0.028..0.029 rows=0 loops=1)
Join Filter: (pn.pnnspid = c.relnamespace)
-> Seq Scan on pg_publication_namespace pn
(cost=0.00..0.00 rows=1 width=4) (actual time=0.004..0.004 rows=0
loops=1)
-> Index Scan using pg_class_oid_index on pg_class c
(cost=0.27..8.29 rows=1 width=4) (never executed)
Index Cond: (oid = '16384'::oid)
-> Sort (cost=4.40..4.41 rows=1 width=32) (actual
time=0.159..0.161 rows=1 loops=1)
Sort Key: (pg_get_expr(pr.prqual, pr.prrelid)) COLLATE "C"
Sort Method: quicksort Memory: 25kB
-> Result (cost=0.00..4.39 rows=1 width=32) (actual
time=0.142..0.147 rows=1 loops=1)
One-Time Filter: ((NOT $0) AND $1)
-> Nested Loop (cost=0.00..4.39 rows=1 width=36)
(actual time=0.016..0.018 rows=1 loops=1)
Join Filter: (p.oid = pr.prpubid)
-> Seq Scan on pg_publication p
(cost=0.00..1.96 rows=1 width=4) (actual time=0.007..0.009 rows=1
loops=1)
Filter: (pubname = 'pub1'::name)
-> Seq Scan on pg_publication_rel pr
(cost=0.00..2.41 rows=1 width=40) (actual time=0.004..0.004 rows=1
loops=1)
Filter: (prrelid = '16384'::oid)
Planning Time: 0.966 ms
Execution Time: 0.327 ms
(25 rows)

3) Euler’s Query:
explain analyze SELECT DISTINCT pg_catalog.pg_get_expr(pr.prqual,
pr.prrelid) FROM pg_catalog.pg_publication p
INNER JOIN pg_catalog.pg_publication_rel pr ON (p.oid = pr.prpubid)
WHERE pr.prrelid = 16384 AND p.pubname IN ( 'pub1' )
AND NOT (SELECT pg_catalog.bool_or(b.puballtables) FROM
pg_catalog.pg_publication b WHERE b.pubname IN ( 'pub1' ))
AND NOT EXISTS( SELECT 1 FROM
pg_catalog.pg_publication_namespace pn INNER JOIN
pg_catalog.pg_class c ON (pn.pnnspid = c.relnamespace) WHERE
c.oid = pr.prrelid)
;

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=14.69..14.70 rows=1 width=32) (actual time=0.231..0.236
rows=1 loops=1)
InitPlan 1 (returns $0)
-> Aggregate (cost=1.96..1.97 rows=1 width=1) (actual
time=0.031..0.032 rows=1 loops=1)
-> Seq Scan on pg_publication b (cost=0.00..1.96 rows=1
width=1) (actual time=0.019..0.021 rows=1 loops=1)
Filter: (pubname = 'pub1'::name)
-> Sort (cost=12.71..12.72 rows=1 width=32) (actual
time=0.228..0.231 rows=1 loops=1)
Sort Key: (pg_get_expr(pr.prqual, pr.prrelid)) COLLATE "C"
Sort Method: quicksort Memory: 25kB
-> Result (cost=0.27..12.70 rows=1 width=32) (actual
time=0.205..0.210 rows=1 loops=1)
One-Time Filter: (NOT $0)
-> Nested Loop (cost=0.27..12.70 rows=1 width=36)
(actual time=0.103..0.107 rows=1 loops=1)
Join Filter: (pr.prpubid = p.oid)
-> Nested Loop Anti Join (cost=0.27..10.73
rows=1 width=40) (actual time=0.093..0.096 rows=1 loops=1)
Join Filter: (c.oid = pr.prrelid)
-> Seq Scan on pg_publication_rel pr
(cost=0.00..2.41 rows=1 width=40) (actual time=0.008..0.009 rows=1
loops=1)
Filter: (prrelid = '16384'::oid)
-> Nested Loop (cost=0.27..8.30 rows=1
width=4) (actual time=0.079..0.080 rows=0 loops=1)
Join Filter: (pn.pnnspid = c.relnamespace)
-> Index Scan using
pg_class_oid_index on pg_class c (cost=0.27..8.29 rows=1 width=8)
(actual time=0.069..0.072 rows=1 loops=1)
Index Cond: (oid = '16384'::oid)
-> Seq Scan on
pg_publication_namespace pn (cost=0.00..0.00 rows=1 width=4) (actual
time=0.005..0.005 rows=0 loops=1)
-> Seq Scan on pg_publication p
(cost=0.00..1.96 rows=1 width=4) (actual time=0.007..0.007 rows=1
loops=1)
Filter: (pubname = 'pub1'::name)
Planning Time: 1.067 ms
Execution Time: 0.431 ms
(25 rows)

Combining existing query to include NOT EXISTS based on Euler's
changes seems to be better:
SELECT DISTINCT pg_get_expr(prqual, prrelid) FROM pg_publication p
INNER JOIN pg_publication_rel pr ON (p.oid = pr.prpubid)
WHERE pr.prrelid = 16384 AND p.pubname IN ( 'pub1' )
AND NOT (select bool_or(puballtables)
FROM pg_publication
WHERE pubname in ( 'pub1' ))
AND NOT EXISTS (SELECT 1
FROM pg_publication_namespace pn, pg_class c
WHERE c.oid = 16384 AND c.relnamespace = pn.pnnspid);

Thoughts?

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-12-22 04:56:23 Re: sequences vs. synchronous replication
Previous Message Masahiko Sawada 2021-12-22 03:35:30 Re: parallel vacuum comments