Re: bogus: logical replication rows/cols combinations

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: bogus: logical replication rows/cols combinations
Date: 2022-05-06 12:26:27
Message-ID: 527910b5-2530-1b60-5a08-4930b1bf8647@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5/6/22 05:23, houzj(dot)fnst(at)fujitsu(dot)com wrote:
> On Tuesday, May 3, 2022 11:31 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> On Tue, May 3, 2022 at 12:10 AM Tomas Vondra
>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>
>>> On 5/2/22 19:51, Alvaro Herrera wrote:
>>>>> Why would we need to know publications replicated by other
>> walsenders?
>>>>> And what if the subscriber is not connected at the moment? In that case
>>>>> there'll be no walsender.
>>>>
>>>> Sure, if the replica is not connected then there's no issue -- as you
>>>> say, that replica will fail at START_REPLICATION time.
>>>>
>>>
>>> Right, I got confused a bit.
>>>
>>> Anyway, I think the main challenge is defining what exactly we want to
>>> check, in order to ensure "sensible" behavior, without preventing way
>>> too many sensible use cases.
>>>
>>
>> I could think of below two options:
>> 1. Forbid any case where column list is different for the same table
>> when combining publications.
>> 2. Forbid if the column list and row filters for a table are different
>> in the set of publications we are planning to combine. This means we
>> will allow combining column lists when row filters are not present or
>> when column list is the same (we don't get anything additional by
>> combining but the idea is we won't forbid such cases) and row filters
>> are different.
>>
>> Now, I think the points in favor of (1) are that the main purpose of
>> introducing a column list are: (a) the structure/schema of the
>> subscriber is different from the publisher, (b) want to hide sensitive
>> columns data. In both cases, it should be fine if we follow (1) and
>> from Peter E.'s latest email [1] he also seems to be indicating the
>> same. If we want to be slightly more relaxed then we can probably (2).
>> We can decide on something else as well but I feel it should be such
>> that it is easy to explain.
>
> I also think it makes sense to add a restriction like (1). I am planning to
> implement the restriction if no one objects.
>

I'm not going to block that approach if that's the consensus here,
though I'm not convinced.

Let me point out (1) does *not* work for data redaction use case,
certainly not the example Alvaro and me presented, because that relies
on a combination of row filters and column filters. Requiring all column
lists to be the same (and not specific to row filter) prevents that
example from working. Yes, you can create multiple subscriptions, but
that brings it's own set of challenges too.

I doubt forcing users to use the more complex setup is good idea, and
combining the column lists per [1] seems sound to me.

That being said, the good thing is this restriction seems it might be
relaxed in the future to work per [1], without causing any backwards
compatibility issues.

Should we do something similar for row filters, though? It seems quite
weird we're so concerned about unexpected behavior due to combining
column lists (despite having a patch that makes it behave sanely), and
at the same time wave off similarly strange behavior due to combining
row filters because "that's what you get if you define the publications
in a strange way".

regards

[1]
https://www.postgresql.org/message-id/5a85b8b7-fc1c-364b-5c62-0bb3e1e25824%40enterprisedb.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zaorang Yang 2022-05-06 12:40:33 Fix typo in comment
Previous Message Michael Paquier 2022-05-06 11:02:36 Re: Fix typo in code comment - origin.c