Re: Column Filtering in Logical Replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Column Filtering in Logical Replication
Date: 2022-01-06 09:41:30
Message-ID: CAA4eK1JG9sg8FEx=gKVdtKNA6UpGzqkW0DEftcByP_DoSAHN4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 27, 2021 at 10:36 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> Determining that an array has a NULL element seems convoluted. I ended
> up with this query, where comparing the result of array_positions() with
> an empty array does that. If anybody knows of a simpler way, or any
> situations in which this fails, I'm all ears.
>
> with published_cols as (
> select case when
> pg_catalog.array_positions(pg_catalog.array_agg(unnest), null) <> '{}' then null else
> pg_catalog.array_agg(distinct unnest order by unnest) end AS attrs
> from pg_catalog.pg_publication p join
> pg_catalog.pg_publication_rel pr on (p.oid = pr.prpubid) left join
> unnest(prattrs) on (true)
> where prrelid = 38168 and p.pubname in ('pub1', 'pub2')
> )
> SELECT a.attname,
> a.atttypid,
> a.attnum = ANY(i.indkey)
> FROM pg_catalog.pg_attribute a
> LEFT JOIN pg_catalog.pg_index i
> ON (i.indexrelid = pg_get_replica_identity_index(38168)),
> published_cols
> WHERE a.attnum > 0::pg_catalog.int2
> AND NOT a.attisdropped and a.attgenerated = ''
> AND a.attrelid = 38168
> AND (published_cols.attrs IS NULL OR attnum = ANY(published_cols.attrs))
> ORDER BY a.attnum;
>
> This returns all columns if at least one publication has a NULL prattrs,
> or only the union of columns listed in all publications, if all
> publications have a list of columns.
>

Considering this, don't we need to deal with "For All Tables" and "For
All Tables In Schema .." Publications in this query? The row filter
patch deal with such cases. The row filter patch handles the NULL case
via C code which makes the query relatively simpler. I am not sure if
the same logic can be used here but having a simple query here have
merit that if we want to use a single query to fetch both column and
row filters then we should be able to enhance it without making it
further complicated.

> (I was worried about obtaining the list of publications, but it turns
> out that it's already as a convenient list of OIDs in the MySubscription
> struct.)
>
> With this, we can remove the second query added by Rahila's original patch to
> filter out nonpublished columns.
>
> I still need to add pg_partition_tree() in order to search for
> publications containing a partition ancestor. I'm not yet sure what
> happens (and what *should* happen) if an ancestor is part of a
> publication and the partition is also part of a publication, and the
> column lists differ.
>

Shouldn't we try to have a behavior similar to the row filter patch
for this case? The row filter patch behavior is as follows: "If your
publication contains a partitioned table, the publication parameter
publish_via_partition_root determines if it uses the partition row
filter (if the parameter is false, the default) or the root
partitioned table row filter. During initial tablesync, it doesn't do
any special handling for partitions.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2022-01-06 10:24:09 Re: Add 64-bit XIDs into PostgreSQL 15
Previous Message Julien Rouhaud 2022-01-06 09:05:32 Re: pl/pgsql feature request: shorthand for argument and local variable references