Re: row filtering for logical replication

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "Peter Smith" <smithpb2250(at)gmail(dot)com>, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Greg Nancarrow" <gregn4422(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "vignesh C" <vignesh21(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "Ajin Cherian" <itsajin(at)gmail(dot)com>, "Dilip Kumar" <dilipbalaut(at)gmail(dot)com>, "Rahila Syed" <rahilasyed90(at)gmail(dot)com>, "Peter Eisentraut" <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, "Michael Paquier" <michael(at)paquier(dot)xyz>, "David Steele" <david(at)pgmasters(dot)net>, "Craig Ringer" <craig(at)2ndquadrant(dot)com>, "Amit Langote" <amitlangote09(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-12-06 01:19:27
Message-ID: b676aef0-00c7-4c19-85f8-33786594e807@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 3, 2021, at 8:12 PM, Euler Taveira wrote:
> PS> I will update the commit message in the next version. I barely changed the
> documentation to reflect the current behavior. I probably missed some changes
> but I will fix in the next version.
I realized that I forgot to mention a few things about the UPDATE behavior.
Regardless of 0003, we need to define which tuple will be used to evaluate the
row filter for UPDATEs. We already discussed it circa [1]. This current version
chooses *new* tuple. Is it the best choice?

Let's check all cases. There are 2 rows on the provider. One row satisfies the
row filter and the other one doesn't. For each case, I expect the initial rows
to be there (no modifications). The DDLs are:

CREATE TABLE foo (a integer, b text, PRIMARY KEY(a));
INSERT INTO foo (a, b) VALUES(10, 'abc'),(30, 'abc');
CREATE PUBLICATION bar FOR TABLE foo WHERE (a > 20);

The table describes what happen on the subscriber. BEFORE is the current row on
subscriber. OLD, NEW and OLD & NEW are action/row if we consider different ways
to evaluate the row filter.

-- case 1: old tuple (10, abc) ; new tuple (10, def)
UPDATE foo SET b = 'def' WHERE a = 10;

+-----------+--------------------+------------------+------------------+
| BEFORE | OLD | NEW | OLD & NEW |
+-----------+--------------------+------------------+------------------+
| NA | NA | NA | NA |
+-----------+--------------------+------------------+------------------+

If the old and new tuple don't satisfy the row filter, there is no issue.

-- case 2: old tuple (30, abc) ; new tuple (30, def)
UPDATE foo SET b = 'def' WHERE a = 30;

+-----------+--------------------+------------------+------------------+
| BEFORE | OLD | NEW | OLD & NEW |
+-----------+--------------------+------------------+------------------+
| (30, abc) | UPDATE (30, def) | UPDATE (30, def) | UPDATE (30, def) |
+-----------+--------------------+------------------+------------------+

If the old and new tuple satisfy the row filter, there is no issue.

-- case 3: old tuple (30, abc) ; new tuple (10, def)
UPDATE foo SET a = 10, b = 'def' WHERE a = 30;

+-----------+--------------------+------------------+------------------+
| BEFORE | OLD | NEW | OLD & NEW |
+-----------+--------------------+------------------+------------------+
| (30, abc) | UPDATE (10, def) * | KEEP (30, abc) * | KEEP (30, abc) * |
+-----------+--------------------+------------------+------------------+

If the old tuple satisfies the row filter but the new tuple doesn't, we have a
data consistency issue. Since the old tuple satisfies the row filter, the
initial table synchronization copies this row. However, after the UPDATE the
new tuple doesn't satisfy the row filter then, from the data consistency
perspective, that row should be removed on the subscriber.

The OLD sends the UPDATE because it satisfies the row filter (if it is a
sharding solution this new row should be moved to another node). The new row
would likely not be modified by replication again. That's a data inconsistency
according to the row filter.

The NEW and OLD & NEW don't send the UPDATE because it doesn't satisfy the row
filter. Keep the old row is undesirable because it doesn't reflect what we have
on the source. This row on the subscriber would likely not be modified by
replication again. If someone inserted a new row with a = 30, replication will
stop because there is already a row with that value.

-- case 4: old tuple (10, abc) ; new tuple (30, def)
UPDATE foo SET a = 30, b = 'def' WHERE a = 10;

+-----------+--------------------+------------------+------------------+
| BEFORE | OLD | NEW | OLD & NEW |
+-----------+--------------------+------------------+------------------+
| NA | NA ! | NA ! | NA |
+-----------+--------------------+------------------+------------------+

The OLD and OLD & NEW don't send the UPDATE because it doesn't satisfy the row
filter. The NEW sends the UPDATE because it satisfies the row filter but there
is no row to modify. The current behavior does nothing. However, it should
INSERT the new tuple. Subsequent UPDATE or DELETE have no effect. It could be a
surprise for an application that expects the same data set from the provider.

If we have to choose the default behavior I would say use the old tuple for
evaluates row filter. Why? The validation already restricts the columns to
replica identity so there isn't an issues with missing (NULL) columns. The case
3 updates the row with a value that is not consistent but keeping the old row
is worse because it could stop the replication if someone inserted the old key
in a new row on the provider. The case 4 ignores the UPDATE if it cannot find
the tuple but it could provide an error if there was an strict mode.

Since this change is very simple to revert, this new version contains this
modification. I also improve the documentation, remove extra parenthesis from
psql/pg_dump. As I said in the previous email, I merged the validation patch too.

FWIW in the previous version, I removed a code that compares nodes to decide if
it is necessary to remove the publication-relation entry. I had a similar code
in a ancient version of this patch but decided that the additional code is not
worth.

There is at least one issue in the current code that should be addressed: PK or
REPLICA IDENTITY modification could break the publication check for UPDATEs and
DELETEs.

[1] https://postgr.es/m/202107162135.m5ehijgcasjk@alvherre.pgsql

--
Euler Taveira
EDB https://www.enterprisedb.com/

Attachment Content-Type Size
0001-Row-filter-for-logical-replication.patch text/x-patch 97.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-12-06 01:47:30 Re: [PATCH] Don't block HOT update by BRIN index
Previous Message Greg Nancarrow 2021-12-06 01:10:29 Re: On login trigger: take three