RE: Column Filtering in Logical Replication

From: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Column Filtering in Logical Replication
Date: 2022-03-15 02:08:50
Message-ID: TYAPR01MB6315D664D926EF66DD6E91FCFD109@TYAPR01MB6315.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 14, 2022 5:08 AM Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 3/12/22 05:30, Amit Kapila wrote:
> >> ...
> >
> > Okay, please find attached. I have done basic testing of this, if we
> > agree with this approach then this will require some more testing.
> >
>
> Thanks, the proposed changes seem like a clear improvement, so I've
> added them, with some minor tweaks (mostly to comments).
>
> I've also included the memory context rename (entry_changes to the
> change proposed by Wang Wei, using a single SQL command in tablesync.
>
> And I've renamed the per-entry memory context to entry_cxt, and used it
> for the column list.
>

Thanks for your patch.
Here are some comments for column filter main patch (0003 patch).

1. doc/src/sgml/catalogs.sgml
@@ -6263,6 +6263,19 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
Reference to schema
</para></entry>
</row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>prattrs</structfield> <type>int2vector</type>
+ (references <link linkend="catalog-pg-attribute"><structname>pg_attribute</structname></link>.<structfield>attnum</structfield>)
+ </para>
+ <para>
+ This is an array of values that indicates which table columns are
+ part of the publication. For example, a value of <literal>1 3</literal>
+ would mean that the first and the third table columns are published.
+ A null value indicates that all columns are published.
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>

This change was added to pg_publication_namespace view. I think it should be
added to pg_publication_rel view, right?

2. src/backend/replication/pgoutput/pgoutput.c
@@ -188,6 +202,7 @@ static EState *create_estate_for_relation(Relation rel);
static void pgoutput_row_filter_init(PGOutputData *data,
List *publications,
RelationSyncEntry *entry);
+
static bool pgoutput_row_filter_exec_expr(ExprState *state,
ExprContext *econtext);
static bool pgoutput_row_filter(Relation relation, TupleTableSlot *old_slot,

Should we remove this change?

3. src/backend/commands/publicationcmds.c
+/*
+ * Check if all columns referenced in the column list are part of the
+ * REPLICA IDENTITY index or not.
+ *
+ * Returns true if any invalid column is found.
+ */

The comment for pub_collist_contains_invalid_column() seems wrong. Should it be
"Check if all REPLICA IDENTITY columns are covered by the column list or not"?

4.
The patch doesn't allow delete and update operations if the target table uses
replica identity full and it is published with column list specified, even if
column list includes all columns in the table.

For example:
create table tbl (a int, b int, c int);
create publication pub for table tbl (a, b, c);
alter table tbl replica identity full;

postgres=# delete from tbl;
ERROR: cannot delete from table "tbl"
DETAIL: Column list used by the publication does not cover the replica identity.

Should we allow this case? I think it doesn't seem to cause harm.

5.
Maybe we need some changes for tab-complete.c.

Regards,
Shi yu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-03-15 02:37:25 Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)
Previous Message Michael Paquier 2022-03-15 02:04:26 Re: add checkpoint stats of snapshot and mapping files of pg_logical dir