Re: row filtering for logical replication

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-07-13 06:59:08
Message-ID: CAHut+PuLoZuHD_A=n8GshC84Nc=8guReDsTmV1RFsCYojssD8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Euler,

Greg noticed that your patch set was missing any implementation of the
psql tab auto-complete for the new row filter WHERE syntax.

So I have added a POC patch for this missing feature.

Unfortunately, there is an existing HEAD problem overlapping with this
exact same code. I reported this already in another thread [1].

So there are 2 patches attached here:
0001 - Fixes the other reported problem (I hope this may be pushed soon)
0002 - Adds the tab-completion code for your row filter WHERE's

------
[1] https://www.postgresql.org/message-id/CAHut+Ps-vkmnWAShWSRVCB3gx8aM=bFoDqWgBNTzofK0q1LpwA@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

On Tue, Jul 13, 2021 at 1:25 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Mon, Jul 12, 2021 at 5:39 AM Euler Taveira <euler(at)eulerto(dot)com> wrote:
> >
> > On Fri, Jul 2, 2021, at 4:29 AM, Peter Smith wrote:
> >
> > Hi.
> >
> > I have been looking at the latest patch set (v16). Below are my review
> > comments and some patches.
> >
> > Peter, thanks for your detailed review. Comments are inline.
> >
>
> Hi Euler,
>
> Thanks for addressing my previous review comments.
>
> I have reviewed the latest v18 patch. Below are some more review
> comments and patches.
>
> (The patches 0003,0004 are just examples of what is mentioned in my
> comments; The patches 0001,0002 are there only to try to keep cfbot
> green).
>
> //////////
>
> 1. Commit comment - wording
>
> "When a publication is defined or modified, rows that don't satisfy a
> WHERE clause may be
> optionally filtered out."
>
> =>
>
> I think this means to say: "Rows that don't satisfy an optional WHERE
> clause will be filtered out."
>
> ------
>
> 2. Commit comment - wording
>
> "The row filter is per table, which allows different row filters to be
> defined for different tables."
>
> =>
>
> I think all that is the same as just saying: "The row filter is per table."
>
> ------
>
> 3. PG docs - independent improvement
>
> You wrote (ref [1] point 3):
>
> "I agree it can be confusing. BTW, CREATE PUBLICATION does not mention that the
> root partitioned table is used. We should improve that sentence too."
>
> I agree, but that PG docs improvement is independent of your RowFilter
> patch; please make another thread for that idea.
>
> ------
>
> 4. doc/src/sgml/ref/create_publication.sgml - independent improvement
>
> @@ -131,9 +135,9 @@ CREATE PUBLICATION <replaceable
> class="parameter">name</replaceable>
> on its partitions) contained in the publication will be published
> using the identity and schema of the partitioned table rather than
> that of the individual partitions that are actually changed; the
> - latter is the default. Enabling this allows the changes to be
> - replicated into a non-partitioned table or a partitioned table
> - consisting of a different set of partitions.
> + latter is the default (<literal>false</literal>). Enabling this
> + allows the changes to be replicated into a non-partitioned table or a
> + partitioned table consisting of a different set of partitions.
> </para>
>
> I think that Tomas wrote (ref [2] point 2) that this change seems
> unrelated to your RowFilter patch.
>
> I agree; I liked the change, but IMO you need to propose this one in
> another thread too.
>
> ------
>
> 5. doc/src/sgml/ref/create_subscription.sgml - wording
>
> @@ -102,7 +102,16 @@ CREATE SUBSCRIPTION <replaceable
> class="parameter">subscription_name</replaceabl
> <para>
> Specifies whether the existing data in the publications that are
> being subscribed to should be copied once the replication starts.
> - The default is <literal>true</literal>.
> + The default is <literal>true</literal>. If any table in the
> + publications has a <literal>WHERE</literal> clause, rows that do not
> + satisfy the <replaceable class="parameter">expression</replaceable>
> + will not be copied. If the subscription has several publications in
> + which a table has been published with different
> + <literal>WHERE</literal> clauses, rows must satisfy all expressions
> + to be copied. If any table in the publications has a
> + <literal>WHERE</literal> clause, data synchronization does not use it
> + if the subscriber is a <productname>PostgreSQL</productname> version
> + before 15.
>
> I felt that the sentence: "If any table in the publications has a
> <literal>WHERE</literal> clause, data synchronization does not use it
> if the subscriber is a <productname>PostgreSQL</productname> version
> before 15."
>
> Could be expressed more simply like: "If the subscriber is a
> <productname>PostgreSQL</productname> version before 15 then any row
> filtering is ignored."
>
> ------
>
> 6. src/backend/commands/publicationcmds.c - wrong function comment
>
> @@ -585,6 +611,9 @@ OpenTableList(List *tables)
>
> /*
> * Close all relations in the list.
> + *
> + * Publication node can have a different list element, hence, pub_drop_table
> + * indicates if it has a Relation (true) or PublicationTable (false).
> */
> static void
> CloseTableList(List *rels)
>
> =>
>
> The 2nd parameter does not exist in v18, so that comment about
> pub_drop_table seems to be a cut/paste error from the OpenTableList.
>
> ------
>
> src/backend/replication/logical/tablesync.c - bug ?
>
> @@ -829,16 +883,23 @@ copy_table(Relation rel)
> relmapentry = logicalrep_rel_open(lrel.remoteid, NoLock);
> Assert(rel == relmapentry->localrel);
>
> + /* List of columns for COPY */
> + attnamelist = make_copy_attnamelist(relmapentry);
> +
> /* Start copy on the publisher. */
> =>
>
> I did not understand the above call to make_copy_attnamelist. The
> result seems unused before it is overwritten later in this same
> function (??)
>
> ------
>
> 7. src/backend/replication/logical/tablesync.c -
> fetch_remote_table_info enhancement
>
> + /* Get relation qual */
> + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 150000)
> + {
> + resetStringInfo(&cmd);
> + appendStringInfo(&cmd,
> + "SELECT pg_get_expr(prqual, prrelid) "
> + " FROM pg_publication p "
> + " INNER JOIN pg_publication_rel pr "
> + " ON (p.oid = pr.prpubid) "
> + " WHERE pr.prrelid = %u "
> + " AND p.pubname IN (", lrel->remoteid);
>
> =>
>
> I think a small improvement is possible in this SQL.
>
> If we change that to "SELECT DISTINCT pg_get_expr(prqual, prrelid)"...
> then it avoids the copy SQL from having multiple WHERE clauses which
> are all identical. This could happen when subscribed to multiple
> publications which had the same filter for the same table.
>
> I attached a tmp POC patch for this change and it works as expected.
> For example, I subscribe to 3 publications, but 2 of them have the
> same filter for the table.
>
> BEFORE
> COPY (SELECT key, value, data FROM public.test WHERE (key > 0) AND
> (key > 1000) AND (key > 1000)) TO STDOUT
>
> AFTER
> COPY (SELECT key, value, data FROM public.test WHERE (key > 0) AND
> (key > 1000) ) TO STDOUT
>
> ------
>
> 8. src/backend/replication/pgoutput/pgoutput.c - qual member is redundant
>
> @@ -99,6 +108,9 @@ typedef struct RelationSyncEntry
>
> bool replicate_valid;
> PublicationActions pubactions;
> + List *qual; /* row filter */
> + List *exprstate; /* ExprState for row filter */
> + TupleTableSlot *scantuple; /* tuple table slot for row filter */
>
> =>
>
> Now that the exprstate is introduced I think that the other member
> "qual" is redundant, so it can be removed.
>
> FYI - I attached a tmp patch with all the qual references deleted and
> everything is fine.
>
> ------
>
> 9. src/backend/replication/pgoutput/pgoutput.c - comment typo?
>
> + /*
> + * Cache ExprState using CacheMemoryContext. This is the same code as
> + * ExecPrepareExpr() but it is not used because it doesn't use an EState.
> + * It should probably be another function in the executor to handle the
> + * execution outside a normal Plan tree context.
> + */
>
> =>
>
> typo: it/that ?
>
> I think it ought to say "This is the same code as ExecPrepareExpr()
> but that is not used because"...
>
> ------
>
> 10. src/backend/replication/pgoutput/pgoutput.c - redundant debug logging?
>
> + /* Evaluates row filter */
> + result = pgoutput_row_filter_exec_expr(exprstate, ecxt);
> +
> + elog(DEBUG3, "row filter %smatched", result ? "" : "not ");
>
> The above debug logging is really only a repeat (with different
> wording) of the same information already being logged inside the
> pgoutput_row_filter_exec_expr function isn't it? Consider removing the
> redundant logging.
>
> e.g. This is already getting logged by pgoutput_row_filter_exec_expr:
>
> elog(DEBUG3, "row filter evaluates to %s (isnull: %s)",
> DatumGetBool(ret) ? "true" : "false",
> isnull ? "true" : "false");
>
>
> ------
> [1] https://www.postgresql.org/message-id/532a18d8-ce90-4444-8570-8a9fcf09f329%40www.fastmail.com
> [2] https://www.postgresql.org/message-id/849ee491-bba3-c0ae-cc25-4fce1c03f105%40enterprisedb.com
> [3] https://www.postgresql.org/message-id/532a18d8-ce90-4444-8570-8a9fcf09f329%40www.fastmail.com
>
> Kind Regards,
> Peter Smith.
> Fujitsu Australia

Attachment Content-Type Size
v18-0001-fix-tab-auto-complete-CREATE-PUBLICATION.patch application/octet-stream 1.6 KB
v18-0002-tmp-Add-tab-auto-complete-support-for-the-Row-Fi.patch application/octet-stream 2.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2021-07-13 06:59:56 RE: Failed transaction statistics to measure the logical replication progress
Previous Message Yugo NAGATA 2021-07-13 06:50:52 Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors