Re: row filtering for logical replication

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: Euler Taveira <euler(at)timbira(dot)com(dot)br>, Craig Ringer <craig(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2018-12-27 19:05:11
Message-ID: 20181227190511.GI2528@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Petr Jelinek (petr(dot)jelinek(at)2ndquadrant(dot)com) wrote:
> On 14/12/2018 16:38, Stephen Frost wrote:
> > * Petr Jelinek (petr(dot)jelinek(at)2ndquadrant(dot)com) wrote:
> >> I do see the appeal here, if you consider logical replication to be a
> >> streaming select it probably applies well.
> >>
> >> But given that this is happening inside output plugin which does not
> >> have full executor setup and has catalog-only snapshot I am not sure how
> >> feasible it is to try to merge these two things. As per my previous
> >> email it's possible that we'll have to be stricter about what we allow
> >> in expressions here.
> >
> > I can certainly understand the concern about trying to combine the
> > implementation of this with that of RLS; perhaps that isn't a good fit
> > due to the additional constraints put on logical decoding.
> >
> > That said, I still think it might make sense to consider these filters
> > for logical decoding to be policies and, ideally, to allow users to use
> > the same policy for both.
>
> I am not against that as long as it's possible to have policy for
> logical replication without having it for RLS and vice versa.

RLS already is able to be enabled/disabled on a per-table basis. I
could see how we might want to extend the existing policy system to have
a way to enable/disable individual policies for RLS but that should be
reasonably straight-forward to do, I would think.

> I also wonder if policies are flexible enough to allow for specifying
> OLD and NEW - the replication filtering deals with DML, not with what's
> visible, it might very well depend on differences between these (that's
> something the current patch is missing as well BTW).

The policy system already has the notion of a 'visible' check and a
'does the new row match this' check (USING vs. WITH CHECK policies).
Perhaps if you could outline the specific use-cases that you're thinking
about, we could discuss them and make sure that they fit within those
mechanisms- or, if not, discuss if such a use-case would make sense for
RLS as well and, if so, figure out a way to support that for both.

> > In the end, the idea of having to build a single large and complex
> > 'create publication' command which has a bunch of tables, each with
> > their own filter clauses, just strikes me as pretty painful.
> >
> >> The other issue with merging this is that the use-case for filtering out
> >> the data in logical replication is not necessarily about security, but
> >> often about sending only relevant data. So it makes sense to have filter
> >> on publication without RLS enabled on table and if we'd force that, we'd
> >> limit usefulness of this feature.
> >
> > I definitely have a serious problem if we are going to say that you
> > can't use this filtering for security-sensitive cases.
>
> I am saying it should not be tied to only security sensitive cases,
> because it has use cases that have nothing to do with security (ie, I
> don't want this to depend on RLS being enabled for a table).

I'm fine with this being able to be independently enabled/disabled,
apart from RLS.

> >> We definitely want to eventually create subscriptions as non-superuser
> >> but that has zero effect on this as everything here is happening on
> >> different server than where subscription lives (we already allow
> >> creation of publications with just CREATE privilege on database and
> >> ownership of the table).
> >
> > What I wasn't clear about above was the idea that we might allow a user
> > other than the table owner to publish a given table, but that such a
> > publication should certanily only be allowed to include the rows which
> > that user has access to- as regulated by RLS. If the RLS policy is too
> > complex to allow that then I would think we'd simply throw an error at
> > the create publication time and the would-be publisher would need to
> > figure that out with the table owner.
>
> My opinion is that this is useful, but not necessarily something v1
> patch needs to solve. Having too many publications and subscriptions to
> various places is not currently practical anyway due to decoding
> duplicating all the work for every connection.

I agree that supporting this could be done in a later patch, however, I
do feel that when we go to add support for non-owners to create
publications then RLS needs to be supported at that point (and by more
than just 'throw an error'). I can agree with incremental improvements
but I don't want to get to a point where we've got a bunch of
independent things only half of which work with other parts of the
system.

> > * Euler Taveira (euler(at)timbira(dot)com(dot)br) wrote:
> >> Em sex, 23 de nov de 2018 às 11:40, Petr Jelinek
> >> <petr(dot)jelinek(at)2ndquadrant(dot)com> escreveu:
> >
> >>> The other issue with merging this is that the use-case for filtering out
> >>> the data in logical replication is not necessarily about security, but
> >>> often about sending only relevant data. So it makes sense to have filter
> >>> on publication without RLS enabled on table and if we'd force that, we'd
> >>> limit usefulness of this feature.
> >>
> >> Use the same infrastructure as RLS could be a good idea but use RLS
> >> for row filtering is not. RLS is complex.
> >
> > Right, this was along the lines I was thinking of- using the
> > infrastructure and the policy system, in particular.
>
> Yeah that part is definitely worth investigating.

Glad to hear that.

Thanks!

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-12-27 19:19:25 Re: row filtering for logical replication
Previous Message Tom Lane 2018-12-27 18:57:04 Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)