Re: Column Filtering in Logical Replication

From: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Column Filtering in Logical Replication
Date: 2022-01-07 11:46:35
Message-ID: 2d6c8b74-bdef-767b-bdb6-29705985ed9c@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I think this is getting pretty good now. I like the overall behavior now.

Some details:

There are still a few references to "filter", but I see most of the
patch now uses column list or something. Maybe do another cleanup
pass before finalizing the patch.

doc/src/sgml/catalogs.sgml needs to be updated.

doc/src/sgml/ref/alter_publication.sgml:

"allows to change" -> "allows changing"

src/backend/catalog/pg_publication.c:

publication_translate_columns(): I find the style of having a couple
of output arguments plus a return value that is actually another
output value confusing. (It would be different if the return value
was some kind of success value.) Let's make it all output arguments.

About the XXX question there: I would make the column numbers always
sorted. I don't have a strong reason for this, but otherwise we might
get version differences, unstable dumps etc. It doesn't seem
complicated to keep this a bit cleaner.

I think publication_translate_columns() also needs to prohibit
generated columns. We already exclude those implicitly throughout the
logical replication code, but if a user explicitly set one here,
things would probably break.

src/backend/commands/tablecmds.c:

ATExecReplicaIdentity(): Regarding the question of how to handle
REPLICA_IDENTITY_NOTHING: I see two ways to do this. Right now, the
approach is that the user can set the replica identity freely, and we
decide later based on that what we can replicate (e.g., no updates).
For this patch, that would mean we don't restrict what columns can be
in the column list, but we check what actions we can replicate based
on the column list. The alternative is that we require the column
list to include the replica identity, as the patch is currently doing,
which would mean that REPLICA_IDENTITY_NOTHING can be allowed since
it's essentially a set of zero columns.

I find the current behavior a bit weird on reflection. If a user
wants to replicate on some columns and only INSERTs, that should be
allowed regardless of what the replica identity columns are.

src/backend/replication/pgoutput/pgoutput.c:

In get_rel_sync_entry(), why did you remove the block

- if (entry->pubactions.pubinsert &&
entry->pubactions.pubupdate &&
- entry->pubactions.pubdelete &&
entry->pubactions.pubtruncate)
- break;

Maybe this is intentional, but it's not clear to me.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2022-01-07 11:58:19 Re: RFC: Logging plan of the running query
Previous Message Michael Paquier 2022-01-07 11:08:57 Re: \dP and \dX use ::regclass without "pg_catalog."