Re: row filtering for logical replication

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Rahila Syed" <rahilasyed90(at)gmail(dot)com>, "Peter Eisentraut" <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, "Michael Paquier" <michael(at)paquier(dot)xyz>, "David Steele" <david(at)pgmasters(dot)net>, "Craig Ringer" <craig(at)2ndquadrant(dot)com>, "Tomas Vondra" <tomas(dot)vondra(at)2ndquadrant(dot)com>, "Amit Langote" <amitlangote09(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-03-31 01:45:55
Message-ID: 86d76e89-9a96-411a-8077-f8918cbae2e1@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 30, 2021, at 8:23 AM, Amit Kapila wrote:
> On Mon, Mar 29, 2021 at 6:47 PM Euler Taveira <euler(at)eulerto(dot)com <mailto:euler%40eulerto.com>> wrote:
> >
> Few comments:
> ==============
> 1. How can we specify row filters for multiple tables for a
> publication? Consider a case as below:
It is not possible. Row filter is a per table option. Isn't it clear from the
synopsis? The current design allows different row filter for tables in the same
publication. It is more flexible than a single row filter for a set of tables
(even if we would support such variant, there are some cases where the
condition should be different because the column names are not the same). You
can easily build a CREATE PUBLICATION command that adds the same row filter
multiple times using a DO block or use a similar approach in your favorite
language.

> postgres=# CREATE TABLE tab_rowfilter_1 (a int primary key, b text);
> CREATE TABLE
> postgres=# CREATE TABLE tab_rowfilter_2 (c int primary key);
> CREATE TABLE
>
> postgres=# CREATE PUBLICATION tap_pub_1 FOR TABLE tab_rowfilter_1,
> tab_rowfilter_2 WHERE (a > 1000 AND b <> 'filtered');
> ERROR: column "a" does not exist
> LINE 1: ...FOR TABLE tab_rowfilter_1, tab_rowfilter_2 WHERE (a > 1000 A...
>
> ^
>
> postgres=# CREATE PUBLICATION tap_pub_1 FOR TABLE tab_rowfilter_1,
> tab_rowfilter_2 WHERE (c > 1000);
> CREATE PUBLICATION
>
> It gives an error when I tried to specify the columns corresponding to
> the first relation but is fine for columns for the second relation.
> Then, I tried few more combinations like below but that didn't work.
> CREATE PUBLICATION tap_pub_1 FOR TABLE tab_rowfilter_1 As t1,
> tab_rowfilter_2 As t2 WHERE (t1.a > 1000 AND t1.b <> 'filtered');
>
> Will users be allowed to specify join conditions among columns from
> multiple tables?
It seems you are envisioning row filter as a publication property instead of a
publication-relation property. Due to the flexibility that the later approach
provides, I decided to use it because it covers more use cases. Regarding
allowing joins, it could possibly slow down a critical path, no? This code path
is executed by every change. If there are interest in the join support, we
might add it in a future patch.

> 2.
> + /*
> + * Although ALTER PUBLICATION grammar allows WHERE clause to be specified
> + * for DROP TABLE action, it doesn't make sense to allow it. We implement
> + * this restriction here, instead of complicating the grammar to enforce
> + * it.
> + */
> + if (stmt->tableAction == DEFELEM_DROP)
> + {
> + ListCell *lc;
> +
> + foreach(lc, stmt->tables)
> + {
> + PublicationTable *t = lfirst(lc);
> +
> + if (t->whereClause)
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot use a WHERE clause when removing table from
> publication \"%s\"",
> + NameStr(pubform->pubname))));
> + }
> + }
>
> Is there a reason to deal with this here separately rather than in the
> ALTER PUBLICATION grammar?
Good question. IIRC the issue is that AlterPublicationStmt->tables has a list
element that was a relation_expr_list and was converted to
publication_table_list. If we share 'tables' with relation_expr_list (for ALTER
PUBLICATION ... DROP TABLE) and publication_table_list (for the other ALTER
PUBLICATION ... ADD|SET TABLE), the OpenTableList() has to know what list
element it is dealing with. I think I came to the conclusion that it is less
uglier to avoid changing OpenTableList() and CloseTableList().

[Doing some experimentation...]

Here is a patch that remove the referred code. It uses 2 distinct list
elements: relation_expr_list for ALTER PUBLICATION ... DROP TABLE and
publication_table_list for for ALTER PUBLICATION ... ADD|SET TABLE. A new
parameter was introduced to deal with the different elements of the list
'tables'.

--
Euler Taveira
EDB https://www.enterprisedb.com/

Attachment Content-Type Size
0001-Row-filter-for-logical-replication.patch text/x-patch 67.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-03-31 01:51:34 Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.
Previous Message Andy Fan 2021-03-31 01:43:05 Re: What to call an executor node which lazily caches tuples in a hash table?