Re: row filtering for logical replication

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Euler Taveira <euler(at)timbira(dot)com(dot)br>, Craig Ringer <craig(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2018-12-27 23:15:10
Message-ID: 70ccb3a2-48d2-c923-71c3-7bd5346a64aa@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 27/12/2018 20:05, Stephen Frost wrote:
> Greetings,
>
> * Petr Jelinek (petr(dot)jelinek(at)2ndquadrant(dot)com) wrote:
>> On 14/12/2018 16:38, Stephen Frost wrote:
>>> * Petr Jelinek (petr(dot)jelinek(at)2ndquadrant(dot)com) wrote:
>>>> I do see the appeal here, if you consider logical replication to be a
>>>> streaming select it probably applies well.
>>>>
>>>> But given that this is happening inside output plugin which does not
>>>> have full executor setup and has catalog-only snapshot I am not sure how
>>>> feasible it is to try to merge these two things. As per my previous
>>>> email it's possible that we'll have to be stricter about what we allow
>>>> in expressions here.
>>>
>>> I can certainly understand the concern about trying to combine the
>>> implementation of this with that of RLS; perhaps that isn't a good fit
>>> due to the additional constraints put on logical decoding.
>>>
>>> That said, I still think it might make sense to consider these filters
>>> for logical decoding to be policies and, ideally, to allow users to use
>>> the same policy for both.
>>
>> I am not against that as long as it's possible to have policy for
>> logical replication without having it for RLS and vice versa.
>
> RLS already is able to be enabled/disabled on a per-table basis. I
> could see how we might want to extend the existing policy system to have
> a way to enable/disable individual policies for RLS but that should be
> reasonably straight-forward to do, I would think.

Sure, I was mostly referring to having ability of enable/disable this
independently of enabling/disabling RLS which you are okay with based on
bellow so no issue there from my side.

>
>> I also wonder if policies are flexible enough to allow for specifying
>> OLD and NEW - the replication filtering deals with DML, not with what's
>> visible, it might very well depend on differences between these (that's
>> something the current patch is missing as well BTW).
>
> The policy system already has the notion of a 'visible' check and a
> 'does the new row match this' check (USING vs. WITH CHECK policies).
> Perhaps if you could outline the specific use-cases that you're thinking
> about, we could discuss them and make sure that they fit within those
> mechanisms- or, if not, discuss if such a use-case would make sense for
> RLS as well and, if so, figure out a way to support that for both.

So we'd use USING for old row images (UPDATE/DELETE) and WITH CHECK for
new ones (UPDATE/INSERT)? I think OLD/NEW is somewhat more natural
naming of this as there is no "SELECT" part of operation here, but as
long as the functionality is there I don't mind syntax that much.

>
>>> In the end, the idea of having to build a single large and complex
>>> 'create publication' command which has a bunch of tables, each with
>>> their own filter clauses, just strikes me as pretty painful.
>>>
>>>> The other issue with merging this is that the use-case for filtering out
>>>> the data in logical replication is not necessarily about security, but
>>>> often about sending only relevant data. So it makes sense to have filter
>>>> on publication without RLS enabled on table and if we'd force that, we'd
>>>> limit usefulness of this feature.
>>>
>>> I definitely have a serious problem if we are going to say that you
>>> can't use this filtering for security-sensitive cases.
>>
>> I am saying it should not be tied to only security sensitive cases,
>> because it has use cases that have nothing to do with security (ie, I
>> don't want this to depend on RLS being enabled for a table).
>
> I'm fine with this being able to be independently enabled/disabled,
> apart from RLS.
>

Cool.

>>>> We definitely want to eventually create subscriptions as non-superuser
>>>> but that has zero effect on this as everything here is happening on
>>>> different server than where subscription lives (we already allow
>>>> creation of publications with just CREATE privilege on database and
>>>> ownership of the table).
>>>
>>> What I wasn't clear about above was the idea that we might allow a user
>>> other than the table owner to publish a given table, but that such a
>>> publication should certanily only be allowed to include the rows which
>>> that user has access to- as regulated by RLS. If the RLS policy is too
>>> complex to allow that then I would think we'd simply throw an error at
>>> the create publication time and the would-be publisher would need to
>>> figure that out with the table owner.
>>
>> My opinion is that this is useful, but not necessarily something v1
>> patch needs to solve. Having too many publications and subscriptions to
>> various places is not currently practical anyway due to decoding
>> duplicating all the work for every connection.
>
> I agree that supporting this could be done in a later patch, however, I
> do feel that when we go to add support for non-owners to create
> publications then RLS needs to be supported at that point (and by more
> than just 'throw an error'). I can agree with incremental improvements
> but I don't want to get to a point where we've got a bunch of
> independent things only half of which work with other parts of the
> system.

Yes, using RLS infrastructure now will make it easier to add support for
publishing without being owner at some later point, just let's please
not make publishing without being owner part of requirements for this.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-12-27 23:15:14 Re: could recovery_target_timeline=latest be the default in standby mode?
Previous Message Tom Lane 2018-12-27 23:14:03 Re: Poor buildfarm coverage of strong-random alternatives