Re: row filtering for logical replication

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Euler Taveira <euler(at)timbira(dot)com(dot)br>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Suzuki Hironobu <hironobu(at)interdb(dot)jp>
Subject: Re: row filtering for logical replication
Date: 2018-11-23 17:54:58
Message-ID: 92e5587d-28b8-5849-2374-5ca3863256f1@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23/11/2018 17:15, Euler Taveira wrote:
> Em qui, 22 de nov de 2018 às 20:03, Petr Jelinek
> <petr(dot)jelinek(at)2ndquadrant(dot)com> escreveu:
>> Firstly, I am not sure if it's wise to allow UDFs in the filter clause
>> for the table. The reason for that is that we can't record all necessary
>> dependencies there because the functions are black box for parser. That
>> means if somebody drops object that an UDF used in replication filter
>> depends on, that function will start failing. But unlike for user
>> sessions it will start failing during decoding (well processing in
>> output plugin). And that's not recoverable by reading the missing
>> object, the only way to get out of that is either to move slot forward
>> which means losing part of replication stream and need for manual resync
>> or full rebuild of replication. Neither of which are good IMHO.
>>
> It is a foot gun but there are several ways to do bad things in
> postgres. CREATE PUBLICATION is restricted to superusers and role with
> CREATE privilege in current database. AFAICS a role with CREATE
> privilege cannot drop objects whose owner is not himself. I wouldn't
> like to disallow UDFs in row filtering expressions just because
> someone doesn't set permissions correctly. Do you have any other case
> in mind?

I don't think this has anything to do with security. Stupid example:

user1: CREATE EXTENSION citext;

user2: CREATE FUNCTION myfilter(col1 text, col2 text) returns boolean
language plpgsql as
$$BEGIN
RETURN col1::citext = col2::citext;
END;$$

user2: ALTER PUBLICATION mypub ADD TABLE mytab WHERE (myfilter(a,b));

[... replication happening ...]

user1: DROP EXTENSION citext;

And now replication is broken and unrecoverable without data loss.
Recreating extension will not help because the changes happening in
meantime will not see it in the historical snapshot.

I don't think it's okay to do completely nothing about this.

>
>> Secondly, do we want to at least notify user on filters (or maybe even
>> disallow them) with combination of action + column where column value
>> will not be logged? I mean for example we do this when processing the
>> filter against a row:
>>
>>> + ExecStoreHeapTuple(new_tuple ? new_tuple : old_tuple, ecxt->ecxt_scantuple, false);
>>
> We could emit a LOG message. That could possibly be an option but it
> could be too complex for the first version.
>

Well, it needs walker which extracts Vars from the expression and checks
them against replica identity columns. We already have a way to fetch
replica identity columns and the walker could be something like
simplified version of the find_expr_references_walker used by the
recordDependencyOnSingleRelExpr (I don't think there is anything ready
made already).

>> But if user has expression on column which is not part of replica
>> identity that expression will always return NULL for DELETEs because
>> only replica identity is logged with actual values and everything else
>> in NULL in old_tuple. So if publication replicates deletes we should
>> check for this somehow.
>>
> In this case, we should document this behavior. That is a recurring
> question in wal2json issues. Besides that we should explain that
> UPDATE/DELETE tuples doesn't log all columns (people think the
> behavior is equivalent to triggers; it is not unless you set REPLICA
> IDENTITY FULL).
>
>> Not really sure that this is actually worth it given that we have to
>> allocate and free this in a loop now while before it was just sitting on
>> a stack.
>>
> That is a experimentation code that should be in a separate patch.
> Don't you think low memory use is a good goal? I also think that
> MaxTupleAttributeNumber is an extreme value. I didn't some preliminary
> tests and didn't notice overheads. I'll leave these modifications in a
> separate patch.
>

It's static memory and it's a few KB of it (it's just single array of
pointers, not array of data, and does not depend on the number of rows).
Palloc will definitely need more CPU cycles.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabrízio de Royes Mello 2018-11-23 18:05:21 Re: row filtering for logical replication
Previous Message Petr Jelinek 2018-11-23 17:53:42 Re: row filtering for logical replication