Re: row filtering for logical replication

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-09-23 12:33:32
Message-ID: 574b4e78-2f35-acf3-4bdc-4b872582e739@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I finally had time to take a closer look at the patch again, so here's
some review comments. The thread is moving fast, so chances are some of
the comments are obsolete or were already raised in the past.

1) I wonder if we should use WHERE or WHEN to specify the expression.
WHERE is not wrong, but WHEN (as used in triggers) might be better.

2) create_publication.sgml says:

A <literal>NULL</literal> value causes the expression to evaluate
to false; avoid using columns without not-null constraints in the
<literal>WHERE</literal> clause.

That's not quite correct, I think - doesn't the expression evaluate to
NULL (which is not TRUE, so it counts as mismatch)?

I suspect this whole paragraph (talking about NULL in old/new rows)
might be a bit too detailed / low-level for user docs.

3) create_subscription.sgml

<literal>WHERE</literal> clauses, rows must satisfy all expressions
to be copied. If the subscriber is a

I'm rather skeptical about the principle that all expressions have to
match - I'd have expected exactly the opposite behavior, actually.

I see a subscription as "a union of all publications". Imagine for
example you have a data set for all customers, and you create a
publication for different parts of the world, like

CREATE PUBLICATION customers_france
FOR TABLE customers WHERE (country = 'France');

CREATE PUBLICATION customers_germany
FOR TABLE customers WHERE (country = 'Germany');

CREATE PUBLICATION customers_usa
FOR TABLE customers WHERE (country = 'USA');

and now you want to subscribe to multiple publications, because you want
to replicate data for multiple countries (e.g. you want EU countries).
But if you do

CREATE SUBSCRIPTION customers_eu
PUBLICATION customers_france, customers_germany;

then you won't get anything, because each customer belongs to just a
single country. Yes, I could create multiple individual subscriptions,
one for each country, but that's inefficient and may have a different
set of issues (e.g. keeping them in sync when a customer moves between
countries).

I might have missed something, but I haven't found any explanation why
the requirement to satisfy all expressions is the right choice.

IMHO this should be 'satisfies at least one expression' i.e. we should
connect the expressions by OR, not AND.

4) pg_publication.c

It's a bit suspicious we're adding includes for parser to a place where
there were none before. I wonder if this might indicate some layering
issue, i.e. doing something in the wrong place ...

5) publicationcmds.c

I mentioned this in my last review [1] already, but I really dislike the
fact that OpenTableList accepts a list containing one of two entirely
separate node types (PublicationTable or Relation). It was modified to
use IsA() instead of a flag, but I still find it ugly, confusing and
possibly error-prone.

Also, not sure mentioning the two different callers explicitly in the
OpenTableList comment is a great idea - it's likely to get stale if
someone adds another caller.

6) parse_oper.c

I'm having some second thoughts about (not) allowing UDFs ...

Yes, I get that if the function starts failing, e.g. because querying a
dropped table or something, that breaks the replication and can't be
fixed without a resync.

That's pretty annoying, but maybe disallowing anything user-defined
(functions and operators) is maybe overly anxious? Also, extensibility
is one of the hallmarks of Postgres, and disallowing all custom UDF and
operators seems to contradict that ...

Perhaps just explaining that the expression can / can't do in the docs,
with clear warnings of the risks, would be acceptable.

7) exprstate_list

I'd just call the field / variable "exprstates", without indicating the
data type. I don't think we do that anywhere.

8) RfCol

Do we actually need this struct? Why not to track just name or attnum,
and lookup the other value in syscache when needed?

9) rowfilter_expr_checker

* Walk the parse-tree to decide if the row-filter is valid or not.

I don't see any clear explanation what does "valid" mean.

10) WHERE expression vs. data type

Seem ATExecAlterColumnType might need some changes, because changing a
data type for column referenced by the expression triggers this:

test=# alter table t alter COLUMN c type text;
ERROR: unexpected object depending on column: publication of
table t in publication p

11) extra (unnecessary) parens in the deparsed expression

test=# alter publication p add table t where ((b < 100) and (c < 100));
ALTER PUBLICATION
test=# \dRp+ p
Publication p
Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root
-------+------------+---------+---------+---------+-----------+----------
user | f | t | t | t | t | f
Tables:
"public.t" WHERE (((b < 100) AND (c < 100)))

12) WHERE expression vs. changing replica identity

Peter Smith already mentioned this in [3], but there's a bunch of places
that need to check the expression vs. replica identity. Consider for
example this:

test=# alter publication p add table t where (b < 100);
ERROR: cannot add relation "t" to publication
DETAIL: Row filter column "b" is not part of the REPLICA IDENTITY

test=# alter table t replica identity full;
ALTER TABLE

test=# alter publication p add table t where (b < 100);
ALTER PUBLICATION

test=# alter table t replica identity using INDEX t_pkey ;
ALTER TABLE

Which means the expression is not covered by the replica identity.

12) misuse of REPLICA IDENTITY

The more I think about this, the more I think we're actually misusing
REPLICA IDENTITY for something entirely different. The whole purpose of
RI was to provide a row identifier for the subscriber.

But now we're using it to ensure we have all the necessary columns,
which is entirely orthogonal to the original purpose. I predict this
will have rather negative consequences.

People will either switch everything to REPLICA IDENTITY FULL, or create
bogus unique indexes with extra columns. Which is really silly, because
it wastes network bandwidth (transfers more data) or local resources
(CPU and disk space to maintain extra indexes).

IMHO this needs more infrastructure to request extra columns to decode
(e.g. for the filter expression), and then remove them before sending
the data to the subscriber.

13) turning update into insert

I agree with Ajin Cherian [4] that looking at just old or new row for
updates is not the right solution, because each option will "break" the
replica in some case. So I think the goal "keeping the replica in sync"
is the right perspective, and converting the update to insert/delete if
needed seems appropriate.

This seems a somewhat similar to what pglogical does, because that may
also convert updates (although only to inserts, IIRC) when handling
replication conflicts. The difference is pglogical does all this on the
subscriber, while this makes the decision on the publisher.

I wonder if this might have some negative consequences, or whether
"moving" this to downstream would be useful for other purposes in the
fuure (e.g. it might be reused for handling other conflicts).

14) pgoutput_row_filter_update

The function name seems a bit misleading, as it suggests might seem like
it updates the row_filter, or something. Should indicate it's about
deciding what to do with the update.

15) pgoutput_row_filter initializing filter

I'm not sure I understand why the filter initialization gets moved from
get_rel_sync_entry. Presumably, most of what the replication does is
replicating rows, so I see little point in not initializing this along
with the rest of the rel_sync_entry.

regards

[1]
https://www.postgresql.org/message-id/849ee491-bba3-c0ae-cc25-4fce1c03f105%40enterprisedb.com

[2]
https://www.postgresql.org/message-id/7106a0fc-8017-c0fe-a407-9466c9407ff8%402ndquadrant.com

[3]
https://www.postgresql.org/message-id/CAHut%2BPukNh_HsN1Au1p9YhG5KCOr3dH5jnwm%3DRmeX75BOtXTEg%40mail.gmail.com

[4]
https://www.postgresql.org/message-id/CAFPTHDb7bpkuc4SxaL9B5vEvF2aEi0EOERdrG%2BxgVeAyMJsF%3DQ%40mail.gmail.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2021-09-23 12:37:53 Re: Release 14 Schedule
Previous Message Aleksander Alekseev 2021-09-23 12:06:19 Re: mark the timestamptz variant of date_bin() as stable