Re: bogus: logical replication rows/cols combinations

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: bogus: logical replication rows/cols combinations
Date: 2022-05-17 03:26:21
Message-ID: CAA4eK1+sw3oJY8RZWU4BFjLoOU2NqZVgrJ-We_10MCm4F444zQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 16, 2022 at 6:50 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2022-May-16, Amit Kapila wrote:
>
> > > Agreed. If we can get columnlist and rowfilter from pg_publication_tables, it
> > > will be more convenient. And I think users that want to fetch the columnlist
> > > and rowfilter of table can also benefit from it.
> >
> > After the change for this, we will give an error on combining
> > publications where one of the publications specifies all columns in
> > the table and the other doesn't provide any columns. We should not
> > give an error as both mean all columns.
>
> But don't we need to behave the same way for both column lists and row
> filters? I understand that some cases with different row filters for
> different publications have shown to have weird behavior, so I think
> it'd make sense to restrict it in the same way.
>

I think the case where we are worried about row filter behavior is for
initial table sync where we ignore publication actions and that is
true with and without row filters. See email [1]. We are planning to
document that behavior as a separate patch. The idea we have used for
row filters is similar to what IBM DB2 [2] and Oracle [3] uses where
they allow combining filters with pub-action (operation (insert,
update, delete) in their case).

I think both column lists and row filters have a different purpose and
we shouldn't try to make them behave in the same way. The main purpose
of introducing a column list is to have statically different shapes on
publisher and subscriber or hide sensitive column data. In both cases,
it doesn't seem to make sense to combine column lists and we didn't
find any other database doing so. OTOH, for row filters, it makes
sense to combine filters for each pub-action as both IBM DB2 and
Oracle seems to be doing.

>
> > 3.
> > +
> > + /*
> > + * We only throw a warning here so that the subcription can still be
> > + * created and let user aware that something is going to fail later and
> > + * they can fix the publications afterwards.
> > + */
> > + if (list_member(tablelist, rv))
> > + ereport(WARNING,
> > + errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > + errmsg("cannot use different column lists for table \"%s.%s\" in
> > different publications",
> > + nspname, relname));
> >
> > Can we extend this comment to explain the case where after Alter
> > Publication, if the user dumps and restores back the subscription,
> > there is a possibility that "CREATE SUBSCRIPTION" won't work if we
> > give ERROR here instead of WARNING?
>
> Yeah, and not only the comment — I think we need to have more in the
> warning message itself.
>

But as mentioned by Hou-San in his last email (pg_dump of subscription
always set (connect = false) which means it won't try to fetch column
list), I think we don't need to give a WARNING here, instead, we can
use ERROR. So, do we need the extra DETAIL (The subscription "..."
cannot currently be used for replication.) as that is implicit for the
ERROR case?

>
> I think this whole affair is a bit sad TBH and I'm sure it'll give us
> some grief -- similar to replication slots becoming inactive and nobody
> noticing. A user changing a publication in a way that prevents some
> replica from working and the warnings are hidden, they could have
> trouble noticing that the replica is stuck.
>

I agree and it seems this is the best we can do for now.

[1] - https://www.postgresql.org/message-id/CAA4eK1L_98LF7Db4yFY1PhKKRzoT83xtN41jTS5X%2B8OeGrAkLw%40mail.gmail.com
[2] - https://www.ibm.com/docs/en/idr/11.4.0?topic=rows-log-record-variables
[3] - https://docs.oracle.com/en/cloud/paas/goldengate-cloud/gwuad/selecting-and-filtering-rows.html#GUID-11296A70-D953-4426-8EAA-37C2B4432446

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-05-17 03:31:24 Re: Backends stunk in wait event IPC/MessageQueueInternal
Previous Message shiy.fnst@fujitsu.com 2022-05-17 03:25:29 RE: bogus: logical replication rows/cols combinations