Re: row filtering for logical replication

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Euler Taveira <euler(at)timbira(dot)com(dot)br>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Suzuki Hironobu <hironobu(at)interdb(dot)jp>
Subject: Re: row filtering for logical replication
Date: 2018-11-22 23:03:27
Message-ID: 7106a0fc-8017-c0fe-a407-9466c9407ff8@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/11/2018 01:29, Euler Taveira wrote:
> Em qua, 28 de fev de 2018 às 20:03, Euler Taveira
> <euler(at)timbira(dot)com(dot)br> escreveu:
>> The attached patches add support for filtering rows in the publisher.
>>
> I rebased the patch. I added row filtering for initial
> synchronization, pg_dump support and psql support. 0001 removes unused
> code. 0002 reduces memory use. 0003 passes only structure member that
> is used in create_estate_for_relation. 0004 reuses a parser node for
> row filtering. 0005 is the feature. 0006 prints WHERE expression in
> psql. 0007 adds pg_dump support. 0008 is only for debug purposes (I'm
> not sure some of these messages will be part of the final patch).
> 0001, 0002, 0003 and 0008 are not mandatory for this feature.
>
> Comments?
>

Hi,

I think there are two main topics that still need to be discussed about
this patch.

Firstly, I am not sure if it's wise to allow UDFs in the filter clause
for the table. The reason for that is that we can't record all necessary
dependencies there because the functions are black box for parser. That
means if somebody drops object that an UDF used in replication filter
depends on, that function will start failing. But unlike for user
sessions it will start failing during decoding (well processing in
output plugin). And that's not recoverable by reading the missing
object, the only way to get out of that is either to move slot forward
which means losing part of replication stream and need for manual resync
or full rebuild of replication. Neither of which are good IMHO.

Secondly, do we want to at least notify user on filters (or maybe even
disallow them) with combination of action + column where column value
will not be logged? I mean for example we do this when processing the
filter against a row:

> + ExecStoreHeapTuple(new_tuple ? new_tuple : old_tuple, ecxt->ecxt_scantuple, false);

But if user has expression on column which is not part of replica
identity that expression will always return NULL for DELETEs because
only replica identity is logged with actual values and everything else
in NULL in old_tuple. So if publication replicates deletes we should
check for this somehow.

Btw about code (you already fixed the wrong reloid in sync so skipping
that).

0002:
> + for (tupn = 0; tupn < walres->ntuples; tupn++)
> {
> - char *cstrs[MaxTupleAttributeNumber];
> + char **cstrs;
>
> CHECK_FOR_INTERRUPTS();
>
> /* Do the allocations in temporary context. */
> oldcontext = MemoryContextSwitchTo(rowcontext);
>
> + cstrs = palloc(nfields * sizeof(char *));

Not really sure that this is actually worth it given that we have to
allocate and free this in a loop now while before it was just sitting on
a stack.

0005:
> @@ -654,5 +740,10 @@ rel_sync_cache_publication_cb(Datum arg, int cacheid, uint32 hashvalue)
> */
> hash_seq_init(&status, RelationSyncCache);
> while ((entry = (RelationSyncEntry *) hash_seq_search(&status)) != NULL)
> + {
> entry->replicate_valid = false;
> + if (list_length(entry->row_filter) > 0)
> + list_free(entry->row_filter);
> + entry->row_filter = NIL;
> + }

Won't this leak memory? The list_free only frees the list cells, but not
the nodes you stored there before.

Also I think we should document here that the expression is run with the
session environment of the replication connection (so that it's more
obvious that things like CURRENT_USER will not return user which changed
tuple but the replication user).

It would be nice if 0006 had regression test and 0007 TAP test.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2018-11-22 23:09:47 Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query
Previous Message Andrew Dunstan 2018-11-22 22:12:31 Re: [RFC] Removing "magic" oids