Re: row filtering for logical replication

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>
Cc: japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-02-22 10:45:11
Message-ID: CACawEhW0NUMHHAQjVP_xM8gWXKvy9haK9nKhC-H=Qn2yTCv-=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for working on this. I did some review and testing, please see my
comments below.

1)

0008 is only for debug purposes (I'm
> not sure some of these messages will be part of the final patch).
>

I think if you are planning to keep the debug patch, there seems to be an
area of improvement in the following code:

/* If the tuple does not match one of the row filters, bail out */
+ s = TextDatumGetCString(DirectFunctionCall2(pg_get_expr,
CStringGetTextDatum(nodeToString(rfnode)),
ObjectIdGetDatum(relation->rd_id)));
+ if (result)
+ elog(DEBUG2, "pgoutput_row_filter: row filter \"%s\" matched", s);
+ else
+ elog(DEBUG2, "pgoutput_row_filter: row filter \"%s\" not matched", s);
+ pfree(s);
+

We only need to calculate "s" if the debug level is DEBU2 or higher. So, we
could maybe do something like this:

if (log_min_messages <= DEBUG2 || client_min_messages <= DEBUG2)
{
/* and the code block is moved to here */
}

2) I have done some tests with some different expressions that don't exist
on the regression tests, just to make sure that we don't have any edge
cases. All seems to work fine for the expressions
like: (column/1.0)::int::bool::text::bool, CASE WHEN column1> 4000 THEN
column2/ 100 > 1 ELSE false END, COALESCE((column/50000)::bool, false),
NULLIF((column/50000)::int::bool, false), column IS DISTINCT FROM 50040,
row(column, 2, 3) > row(2000, 2, 3), (column IS DISTINCT FROM), column IS
NULL, column IS NOT NULL, composite types

3) As another edge case exploration, I tested with tables/types on
different schemas with escape chars on the schemas/custom types etc. All
looks good.

4) In terms of performance, I also separately verified that the overhead
seems pretty low with the final patch. I used the tests in
commands_to_perf_test.sql file which I shared earlier. The steps in the
file do not intend to measure the time precisely per tuple, but just to see
if there is any noticeable regression while moving lots of data. The
difference between (a) no filter (b) simple filter is between %1-%4, which
could even be considered in the noise level.

5) I guess we can by-pass the function limitation via operators. Do you see
anything problematic with that? I think that should be allowed as it helps
power users to create more complex replications if they need.

CREATE FUNCTION simple_f(int, int) RETURNS bool
> AS $$ SELECT hashint4($1) > $2 $$
> LANGUAGE SQL;
> CREATE OPERATOR =*>
> (
> PROCEDURE = simple_f,
> LEFTARG = int,
> RIGHTARG = int
> );
> CREATE PUBLICATION pub_1 FOR TABLE test WHERE (key =*> 1000);

5.1) It might be useful to have a regression test which has an user-defined
operator on the WHERE clause, and DROP without cascade is not allowed so
that we cover recordDependencyOnExpr() calls in the tests.

Thanks,
Onder KALACI
Software Engineer at Microsoft

Euler Taveira <euler(at)eulerto(dot)com>, 2 Şub 2021 Sal, 13:34 tarihinde şunu
yazdı:

> On Tue, Feb 2, 2021, at 8:38 AM, japin wrote:
>
> In 0003 patch, function GetPublicationRelationQuals() has been defined,
> but it
> never used. So why should we define it?
>
> Thanks for taking a look again. It is an oversight. It was introduced in an
> attempt to refactor ALTER PUBLICATION SET TABLE. In
> AlterPublicationTables, we
> could possibly keep some publication-table mappings that does not change,
> however, since commit 3696a600e2, it is required to find the qual for all
> inheritors (see GetPublicationRelations). I explain this decision in the
> following comment:
>
> /*
> * Remove all publication-table mappings. We could possibly
> * remove (i) tables that are not found in the new table list
> and
> * (ii) tables that are being re-added with a different qual
> * expression. For (ii), simply updating the existing tuple is
> not
> * enough, because of qual expression dependencies.
> */
>
> I will post a new patch set later.
>
>
> --
> Euler Taveira
> EDB https://www.enterprisedb.com/
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2021-02-22 10:52:15 Re: Disallow SSL compression?
Previous Message Vik Fearing 2021-02-22 10:05:09 Re: SEARCH and CYCLE clauses