Re: PG DOCS - logical replication filtering

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PG DOCS - logical replication filtering
Date: 2022-04-11 03:27:30
Message-ID: CAA4eK1JPyVoc1dUjeqbPd9D0_uYxWyyx-8fcsrgiZ5Tpr9OAuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 8, 2022 at 11:42 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Wed, Apr 6, 2022 at 8:58 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
>
> > 3.
> > + <para>
> > + Whenever an <command>UPDATE</command> is processed, the row filter
> > + expression is evaluated for both the old and new row (i.e. before
> > + and after the data is updated).
> > + </para>
> > +
> > + <para>
> > + If both evaluations are <literal>true</literal>, it replicates the
> > + <command>UPDATE</command> change.
> > + </para>
> > +
> > + <para>
> > + If both evaluations are <literal>false</literal>, it doesn't replicate
> > + the change.
> > + </para>
> >
> > I think we can combine these three separate paragraphs.
>
> The first sentence is the explanation, then there are the 3 separate
> “If …” cases mentioned. It doesn’t seem quite right to group just the
> first 2 “if…” cases into one paragraph, while leaving the 3rd one
> separated. OTOH combining everything into one big paragraph seems
> worse. Please confirm if you still want this changed.
>

Yeah, I think we should have something like what Euler's version had
and maybe keep a summary section from the current patch.

> > 4.
> > + <para>
> > + If the publication contains a partitioned table, the publication parameter
> > + <literal>publish_via_partition_root</literal> determines which row filter
> > + is used.
> > + <itemizedlist>
> > +
> > + <listitem>
> > + <para>
> > + If <literal>publish_via_partition_root</literal> is
> > <literal>false</literal>
> > + (default), each <emphasis>partition's</emphasis> row filter is used.
> > + </para>
> > + </listitem>
> > +
> > + <listitem>
> > + <para>
> > + If <literal>publish_via_partition_root</literal> is
> > <literal>true</literal>,
> > + the <emphasis>root partitioned table's</emphasis> row filter is used.
> > + </para>
> > + </listitem>
> > +
> > + </itemizedlist>
> > + </para>
> >
> > I think we can combine this into single para as Euler had in his version.
>
> We can do it, but I am not sure if your review was looking at the
> rendered HTML or just looking at the SGML text? IMO using bullets here
> ended up being more readable (it is also consistent with other bullet
> usages on this page). Please confirm if you still want this changed.
>

Fair enough. We can keep this part as it is.

> > 5.
> > + <note>
> > + <para>
> > + Publication <literal>publish</literal> operations are ignored
> > when copying pre-existing table data.
> > + </para>
> > + </note>
> > +
> > + <note>
> > + <para>
> > + If the subscriber is in a release prior to 15, copy pre-existing data
> > + doesn't use row filters even if they are defined in the publication.
> > + This is because old releases can only copy the entire table data.
> > + </para>
> > + </note>
> >
> > I don't see the need for the first <note> here, the second one seems
> > to convey it.
>
> Well, the 2nd note is only about compatibility with older versions
> doing the subscribe. But the 1st note is not version-specific at all.
> It is saying that the COPY does not take the “publish” option into
> account. If you know of some other docs already mentioning this subtle
> behaviour of the COPY then I can remove this note and just
> cross-reference to the other place. But I did not know anywhere this
> is already mentioned, so that is why I wrote the note about it.
>

I don't see the need to say about general initial sync (COPY) behavior
here. It is already defined at [1]. If we want to enhance, we can do
that as a separate patch to make changes where the initial sync is
explained. I am not sure that is required though.

> >
> > 7.
> > + <para>
> > + The PSQL command <command>\d</command> shows what publications the table is
> > + a member of, as well as that table's row filter expression (if defined) in
> > + those publications.
> > + </para>
> > + <para>
> > + - notice table <literal>t1</literal> is a member of 2 publications, but
> > + has a row filter only in <literal>p1</literal>.
> > + </para>
> > + <para>
> > + - notice table <literal>t2</literal> is a member of 2 publications, and
> > + has a different row filter in each of them.
> >
> > This looks unnecessary to me. Let's remove this part.
>
> I thought something is needed to explain/demonstrate how the user can
> know the different row filters for all the publications that the same
> table is a member of. Otherwise, the user has to guess (??) what
> publications are using their table and then use \dRp+ to dig at all
> those publications to find the row filters.
>
> I can remove all this part from the Examples, but I think at least the
> \d should still be mentioned somewhere. IMO I should put that "PSQL
> commands" section back (which existed in an earlier version) and just
> add a sentence about this. Then this examples part can be removed.
> What do you think?
>

I think the way it is changed in the current patch by moving that
explanation down seems okay to me.

I feel in the initial "Row Filters" and "Row Filter Rules" sections,
we don't need to have separate paragraphs. I think the same is pointed
out by Alvaro as well.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2022-04-11 03:58:23 Re: generic plans and "initial" pruning
Previous Message Amit Langote 2022-04-11 03:05:19 Re: generic plans and "initial" pruning