Re: Column Filtering in Logical Replication

From: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Hou, Zhijie/侯 志杰 <houzj(dot)fnst(at)fujitsu(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Column Filtering in Logical Replication
Date: 2022-01-18 10:33:19
Message-ID: ab205c51-9e47-9ad6-d208-7168269e5b2a@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15.01.22 04:45, Amit Kapila wrote:
>>> I think another issue w.r.t column filter patch is that even while
>>> creating publication (even for 'insert' publications) it should check
>>> that all primary key columns must be part of published columns,
>>> otherwise, it can fail while applying on subscriber as it will try to
>>> insert NULL for the primary key column.
>>
>> I'm not so sure about the primary key aspects, actually; keep in mind
>> that the replica can have a different table definition, and it might
>> have even a completely different primary key. I think this part is up
>> to the user to set up correctly; we have enough with just trying to make
>> the replica identity correct.
>
> But OTOH, the primary key is also considered default replica identity,
> so I think users will expect it to work. You are right this problem
> can also happen if the user defined a different primary key on a
> replica but that is even a problem in HEAD (simple inserts will fail)
> but I am worried about the case where both the publisher and
> subscriber have the same primary key as that works in HEAD.

This would seem to be a departure from the current design of logical
replication. It's up to the user to arrange things so that data can be
applied in general. Otherwise, if the default assumption is that the
schema is the same on both sides, then column filtering shouldn't exist
at all, since that will necessarily break that assumption.

Maybe there could be a strict mode or something that has more checks,
but that would be a separate feature. The existing behavior is that you
can publish anything you want and it's up to you to make sure the
receiving side can store it.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-01-18 10:38:38 Re: support for MERGE
Previous Message Simon Riggs 2022-01-18 10:28:20 Re: generic plans and "initial" pruning