Re: row filtering for logical replication

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-07-13 19:15:48
Message-ID: 4d9e0142-0e80-daee-5236-154cd785828c@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/13/21 12:57 PM, Amit Kapila wrote:
> On Tue, Jul 13, 2021 at 10:24 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> On Mon, Jul 12, 2021 at 3:01 PM Tomas Vondra
>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>>> In terms of implementation, I think there are two basic options - either
>>> we can define a new "expression" type in gram.y, which would be a subset
>>> of a_expr etc. Or we can do it as some sort of expression walker, kinda
>>> like what the transform* functions do now.
>>>
>>
>> I think it is better to use some form of walker here rather than
>> extending the grammar for this. However, the question is do we need
>> some special kind of expression walker here or can we handle all
>> required cases via transformWhereClause() call as the patch is trying
>> to do. AFAIU, the main things we want to prohibit in the filter are:
>> (a) it doesn't refer to any relation other than catalog in where
>> clause, (b) it doesn't use UDFs in any way (in expressions, in
>> user-defined operators, user-defined types, etc.), (c) the columns
>> referred to in the filter should be part of PK or Replica Identity.
>> Now, if all such things can be detected by the approach patch has
>> taken then why do we need a special kind of expression walker? OTOH,
>> if we can't detect some of this then probably we can use a special
>> walker.
>>
>> I think in the long run one idea to allow UDFs is probably by
>> explicitly allowing users to specify whether the function is
>> publication predicate safe and if so, then we can allow such functions
>> in the filter clause.
>>
>
> Another idea here could be to read the publication-related catalog
> with the latest snapshot instead of a historic snapshot. If we do that
> then if the user faces problems as described by Petr [1] due to
> missing dependencies via UDFs then she can Alter the Publication to
> remove/change the filter clause and after that, we would be able to
> recognize the updated filter clause and the system will be able to
> move forward.
>
> I might be missing something but reading publication catalogs with
> non-historic snapshots shouldn't create problems as we use the
> historic snapshots are required to decode WAL.
>

IMHO the best option for v1 is to just restrict the filters to
known-safe expressions. That is, just built-in operators, no UDFs etc.
Yes, it's not great, but both alternative proposals (allowing UDFs or
using current snapshot) are problematic for various reasons.

Even with those restrictions the row filtering seems quite useful, and
we can relax those restrictions later if we find acceptable compromise
and/or decide it's worth the risk. Seems better than having to introduce
new restrictions later.

> I think the problem described by Petr[1] is also possible today if the
> user drops the publication and there is a corresponding subscription,
> basically, the system will stuck with error: "ERROR: publication
> "mypub" does not exist. I think allowing to use non-historic snapshots
> just for publications will resolve that problem as well.
>
> [1] - https://www.postgresql.org/message-id/92e5587d-28b8-5849-2374-5ca3863256f1%402ndquadrant.com
>

That seems like a completely different problem, TBH. For example the
slot is dropped too, which means the WAL is likely gone etc.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-07-13 19:21:17 Re: row filtering for logical replication
Previous Message Tomas Vondra 2021-07-13 19:07:38 Re: row filtering for logical replication