Re: row filtering for logical replication

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com(dot)br>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, hironobu(at)interdb(dot)jp
Subject: Re: row filtering for logical replication
Date: 2018-12-15 14:23:39
Message-ID: fdfd8677-120a-3dbe-2078-4187b3c62939@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 14/12/2018 16:56, Stephen Frost wrote:
> Greetings,
>
> * Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
>> On 11/23/18 8:03 PM, Stephen Frost wrote:
>>> * Fabrízio de Royes Mello (fabriziomello(at)gmail(dot)com) wrote:
>>>> On Fri, Nov 23, 2018 at 4:13 PM Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
>>>> wrote:
>>>>>> If carefully documented I see no problem with it... we already have an
>>>>>> analogous problem with functional indexes.
>>>>>
>>>>> The difference is that with functional indexes you can recreate the
>>>>> missing object and everything is okay again. With logical replication
>>>>> recreating the object will not help.
>>>>>
>>>>
>>>> In this case with logical replication you should rsync the object. That is
>>>> the price of misunderstanding / bad use of the new feature.
>>>>
>>>> As usual, there are no free beer ;-)
>>>
>>> There's also certainly no shortage of other ways to break logical
>>> replication, including ways that would also be hard to recover from
>>> today other than doing a full resync.
>>
>> Sure, but that seems more like an argument against creating additional
>> ones (and for preventing those that already exist). I'm not sure this
>> particular feature is where we should draw the line, though.
>
> I was actually going in the other direction- we should allow it because
> advanced users may know what they're doing better than we do and we
> shouldn't prevent things just because they might be misused or
> misunderstood by a user.
>

That's all good, but we need good escape hatch for when things go south
and we don't have it and IMHO it's not as easy to have one as you might
think.

That's why I would do the simple and safe way first before allowing
more, otherwise we'll be discussing this for next couple of PG versions.

>>> What that seems to indicate, to me at least, is that it'd be awful
>>> nice to have a way to resync the data which doesn't necessairly
>>> involve transferring all of it over again.
>>>
>>> Of course, it'd be nice if we could track those dependencies too,
>>> but that's yet another thing.
>>
>> Yep, that seems like a good idea in general. Both here and for
>> functional indexes (although I suppose sure is a technical reason why it
>> wasn't implemented right away for them).
>
> We don't track function dependencies in general and I could certainly
> see cases where you really wouldn't want to do so, at least not in the
> same way that we track FKs or similar. I do wonder if maybe we didn't
> track function dependencies because we didn't (yet) have create or
> replace function and that now we should. We don't track dependencies
> inside a function either though.

Yeah we can't always have dependencies, it would break some perfectly
valid usage scenarios. Also it's not exactly clear to me how we'd track
dependencies of say plpython function...

>
>>> In short, I'm not sure that I agree with the idea that we shouldn't
>>> allow this and instead I'd rather we realize it and put the logical
>>> replication into some kind of an error state that requires a resync.
>>
>> That would still mean a need to resync the data to recover, so I'm not
>> sure it's really an improvement. And I suppose it'd require tracking the
>> dependencies, because how else would you mark the subscription as
>> requiring a resync? At which point we could decline the DROP without a
>> CASCADE, just like we do elsewhere, no?
>
> I was actually thinking more along the lines of just simply marking the
> publication/subscription as being in a 'failed' state when a failure
> actually happens, and maybe even at that point basically throwing away
> everything except the shell of the publication/subscription (so the user
> can see that it failed and come in and properly drop it); I'm thinking
> about this as perhaps similar to a transaction being aborted.

There are several problems with that. First this happens in historic
snapshot which can't write and on top of that we are in the middle of
error processing so we have our hands tied a bit, it's definitely going
to need bit of creative thinking to do this.

Second, and that's more soft issue (which is probably harder to solve)
what do we do with the slot and subscription. There is one failed
publication, but the subscription may be subscribed to 20 of them, do we
kill the whole subscription because of single failed publication? If we
don't do we continue replicating like nothing has happened but with data
in the failed publication missing (which can be considered data
loss/corruption from the view of user). If we stop replication, do we
clean the slot so that we don't keep back wal/catalog xmin forever
(which could lead to server stopping) or do we keep the slot so that
user can somehow fix the issue (reconfigure subscription to not care
about that publication for example) and continue replication without
further loss?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-12-15 14:44:50 Re: 'infinity'::Interval should be added
Previous Message Petr Jelinek 2018-12-15 14:13:41 Re: row filtering for logical replication