Re: row filtering for logical replication

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: david(at)pgmasters(dot)net
Cc: Euler Taveira <euler(at)timbira(dot)com(dot)br>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Amit Langote <amitlangote09(at)gmail(dot)com>, movead li <movead(dot)li(at)highgo(dot)ca>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2020-12-17 06:43:30
Message-ID: CACawEhW_iMnY9XK2tEb1ig+A+gKeB4cxdJcxMsoCU0SaKPExxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

I'm also interested in this patch. I rebased the changes to the current
master branch and attached. The rebase had two issues. First, patch-8 was
conflicting, and that seems only helpful for debugging purposes during
development. So, I dropped it for simplicity. Second, the changes have a
conflict with `publish_via_partition_root` changes. I tried to fix the
issues, but ended-up having a limitation for now. The limitation is that
"cannot create publication with WHERE clause on the partitioned table
without publish_via_partition_root is set to true". This restriction can be
lifted, though I left out for the sake of focusing on the some issues that
I observed on this patch.

Please see my review:

+ if (list_length(relentry->qual) > 0)
+ {
+ HeapTuple old_tuple;
+ HeapTuple new_tuple;
+ TupleDesc tupdesc;
+ EState *estate;
+ ExprContext *ecxt;
+ MemoryContext oldcxt;
+ ListCell *lc;
+ bool matched = true;
+
+ old_tuple = change->data.tp.oldtuple ?
&change->data.tp.oldtuple->tuple : NULL;
+ new_tuple = change->data.tp.newtuple ?
&change->data.tp.newtuple->tuple : NULL;
+ tupdesc = RelationGetDescr(relation);
+ estate = create_estate_for_relation(relation);
+
+ /* prepare context per tuple */
+ ecxt = GetPerTupleExprContext(estate);
+ oldcxt = MemoryContextSwitchTo(estate->es_query_cxt);
+ ecxt->ecxt_scantuple = ExecInitExtraTupleSlot(estate,
tupdesc, &TTSOpsHeapTuple);
+
+ ExecStoreHeapTuple(new_tuple ? new_tuple : old_tuple,
ecxt->ecxt_scantuple, false);
+
+ foreach(lc, relentry->qual)
+ {
+ Node *qual;
+ ExprState *expr_state;
+ Expr *expr;
+ Oid expr_type;
+ Datum res;
+ bool isnull;
+
+ qual = (Node *) lfirst(lc);
+
+ /* evaluates row filter */
+ expr_type = exprType(qual);
+ expr = (Expr *) coerce_to_target_type(NULL, qual,
expr_type, BOOLOID, -1, COERCION_ASSIGNMENT, COERCE_IMPLICIT_CAST, -1);
+ expr = expression_planner(expr);
+ expr_state = ExecInitExpr(expr, NULL);
+ res = ExecEvalExpr(expr_state, ecxt, &isnull);
+
+ /* if tuple does not match row filter, bail out */
+ if (!DatumGetBool(res) || isnull)
+ {
+ matched = false;
+ break;
+ }
+ }
+
+ MemoryContextSwitchTo(oldcxt);
+

The above part can be considered the core of the logic, executed per tuple.
As far as I can see, it has two downsides.

First, calling `expression_planner()` for every tuple can be quite
expensive. I created a sample table, loaded data and ran a quick benchmark
to see its effect. I attached the very simple script that I used to
reproduce the issue on my laptop. I'm pretty sure you can find nicer ways
of doing similar perf tests, just sharing as a reference.

The idea of the test is to add a WHERE clause to a table, but none of the
tuples are filtered out. They just go through this code-path and send it to
the remote node.

#rows Patched | Master
1M 00:00:25.067536 | 00:00:16.633988
10M 00:04:50.770791 | 00:02:40.945358

So, it seems a significant overhead to me. What do you think?

Secondly, probably more importantly, allowing any operator is as dangerous
as allowing any function as users can create/overload operator(s). For
example, assume that users create an operator which modifies the table that
is being filtered out:

```
CREATE OR REPLACE FUNCTION function_that_modifies_table(left_art INTEGER,
right_arg INTEGER)
RETURNS BOOL AS
$$
BEGIN

INSERT INTO test SELECT * FROM test;

return left_art > right_arg;
END;
$$ LANGUAGE PLPGSQL VOLATILE;

CREATE OPERATOR >>= (
PROCEDURE = function_that_modifies_table,
LEFTARG = INTEGER,
RIGHTARG = INTEGER
);

CREATE PUBLICATION pub FOR TABLE test WHERE (key >>= 0);
``

With the above, we seem to be in trouble. Although the above is an extreme
example, it felt useful to share to the extent of the problem. We probably
cannot allow any free-form SQL to be on the filters.

To overcome these issues, one approach could be to rely on known safe
operators and functions. I believe the btree and hash operators should
provide a pretty strong coverage across many use cases. As far as I can
see, the procs that the following query returns can be our baseline:

```
select DISTINCT amproc.amproc::regproc AS opfamily_procedure
from pg_am am,
pg_opfamily opf,
pg_amproc amproc
where opf.opfmethod = am.oid
and amproc.amprocfamily = opf.oid
order by
opfamily_procedure;
```

With that, we aim to prevent users easily shooting themselves by the foot.

The other problematic area was the performance, as calling
`expression_planner()` for every tuple can be very expensive. To avoid
that, it might be considered to ask users to provide a function instead of
a free form WHERE clause, such that if the function returns true, the tuple
is sent. The allowed functions need to be immutable SQL functions with bool
return type. As we can parse the SQL functions, we should be able to allow
only functions that rely on the above mentioned procs. We can apply as many
restrictions (such as no modification query) as possible. For example, see
below:
```

CREATE OR REPLACE function filter_tuples_for_test(int) returns bool as
$body$
select $1 > 100;
$body$
language sql immutable;

CREATE PUBLICATION pub FOR TABLE test FILTER = filter_tuples_for_tes(key);
```

In terms of performance, calling the function should avoid calling the
`expression_planner()` and yield better performance. Though, this needs to
be verified.

If such an approach makes sense, I'd be happy to work on the patch. Please
provide me feedback.

Thanks,
Onder KALACI
Software Engineer at Microsoft &
Developing the Citus database extension for PostgreSQL

David Steele <david(at)pgmasters(dot)net>, 16 Ara 2020 Çar, 21:43 tarihinde şunu
yazdı:

> On 3/3/20 12:39 PM, David Steele wrote:
> > Hi Euler,
> >
> > On 1/21/20 2:32 AM, Craig Ringer wrote:
> >> On Fri, 17 Jan 2020 at 07:58, Euler Taveira <euler(at)timbira(dot)com(dot)br>
> wrote:
> >>>
> >>> Em qui., 16 de jan. de 2020 às 18:57, Tomas Vondra
> >>> <tomas(dot)vondra(at)2ndquadrant(dot)com> escreveu:
> >>>>
> >>>> Euler, this patch is still in "waiting on author" since 11/25. Do you
> >>>> plan to review changes made by Amit in the patches he submitted, or
> >>>> what
> >>>> are your plans with this patch?
> >>>>
> >>> Yes, I'm working on Amit suggestions. I'll post a new patch as soon
> >>> as possible.
> >>
> >> Great. I think this'd be nice to see.
> >
> > The last CF for PG13 has started. Do you have a new patch ready?
>
> I have marked this patch Returned with Feedback since no new patch has
> been posted.
>
> Please submit to a future CF when a new patch is available.
>
> Regards,
> --
> -David
> david(at)pgmasters(dot)net
>
>
>
>
>

Attachment Content-Type Size
0001-Subject-PATCH-1-8-Remove-unused-atttypmod-column-fro.patch application/octet-stream 1.8 KB
0003-Subject-PATCH-3-8-Refactor-function-create_estate_fo.patch application/octet-stream 2.0 KB
0004-Subject-PATCH-4-8-Rename-a-WHERE-node.patch application/octet-stream 1.8 KB
0002-Subject-PATCH-2-8-Store-number-of-tuples-in-WalRcvEx.patch application/octet-stream 3.1 KB
0005-Subject-PATCH-5-8-Row-filtering-for-logical-replicat.patch application/octet-stream 49.0 KB
0006-Subject-PATCH-6-8-Print-publication-WHERE-condition-.patch application/octet-stream 1.2 KB
0007-Publication-where-condition-support-for-pg_dump.patch application/octet-stream 2.8 KB
commands_to_perf_test.sql application/octet-stream 1.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-12-17 07:04:52 Re: Cache relation sizes?
Previous Message Masahiko Sawada 2020-12-17 06:43:11 Re: [HACKERS] logical decoding of two-phase transactions