Re: row filtering for logical replication

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com(dot)br>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, hironobu(at)interdb(dot)jp
Subject: Re: row filtering for logical replication
Date: 2018-12-27 23:36:25
Message-ID: abd77986-fb20-9fe1-4447-3e11c5039fe9@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27/12/2018 20:19, Stephen Frost wrote:
> Greetings,
>
> * Petr Jelinek (petr(dot)jelinek(at)2ndquadrant(dot)com) wrote:
>> On 14/12/2018 16:56, Stephen Frost wrote:
>>> * Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
>>>> On 11/23/18 8:03 PM, Stephen Frost wrote:
>>>>> * Fabrízio de Royes Mello (fabriziomello(at)gmail(dot)com) wrote:
>>>>>> On Fri, Nov 23, 2018 at 4:13 PM Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
>>>>>> wrote:
>>>>>>>> If carefully documented I see no problem with it... we already have an
>>>>>>>> analogous problem with functional indexes.
>>>>>>>
>>>>>>> The difference is that with functional indexes you can recreate the
>>>>>>> missing object and everything is okay again. With logical replication
>>>>>>> recreating the object will not help.
>>>>>>>
>>>>>>
>>>>>> In this case with logical replication you should rsync the object. That is
>>>>>> the price of misunderstanding / bad use of the new feature.
>>>>>>
>>>>>> As usual, there are no free beer ;-)
>>>>>
>>>>> There's also certainly no shortage of other ways to break logical
>>>>> replication, including ways that would also be hard to recover from
>>>>> today other than doing a full resync.
>>>>
>>>> Sure, but that seems more like an argument against creating additional
>>>> ones (and for preventing those that already exist). I'm not sure this
>>>> particular feature is where we should draw the line, though.
>>>
>>> I was actually going in the other direction- we should allow it because
>>> advanced users may know what they're doing better than we do and we
>>> shouldn't prevent things just because they might be misused or
>>> misunderstood by a user.
>>
>> That's all good, but we need good escape hatch for when things go south
>> and we don't have it and IMHO it's not as easy to have one as you might
>> think.
>
> We don't have a great solution but we should be able to at least drop
> and recreate the publication or subscription, even today, can't we?

Well we can drop thing always, yes, not having ability to drop things
when they break would be bad design. I am debating ability to recover
without rebuilding everything a there are cases where you simply can't
rebuild everything (ie we allow filtering out deletes). I don't like
disabling UDFs either as that means that user created types are unusable
in filters, I just wonder if saying "sorry your replica is gone" is any
better.

> Sure, that means having to recopy everything, but that's what you get if
> you break your publication/subscription.

This is but off-topic here, but I really wonder how are you currently
breaking your publications/subscriptions.

>>>>> What that seems to indicate, to me at least, is that it'd be awful
>>>>> nice to have a way to resync the data which doesn't necessairly
>>>>> involve transferring all of it over again.
>>>>>
>>>>> Of course, it'd be nice if we could track those dependencies too,
>>>>> but that's yet another thing.
>>>>
>>>> Yep, that seems like a good idea in general. Both here and for
>>>> functional indexes (although I suppose sure is a technical reason why it
>>>> wasn't implemented right away for them).
>>>
>>> We don't track function dependencies in general and I could certainly
>>> see cases where you really wouldn't want to do so, at least not in the
>>> same way that we track FKs or similar. I do wonder if maybe we didn't
>>> track function dependencies because we didn't (yet) have create or
>>> replace function and that now we should. We don't track dependencies
>>> inside a function either though.
>>
>> Yeah we can't always have dependencies, it would break some perfectly
>> valid usage scenarios. Also it's not exactly clear to me how we'd track
>> dependencies of say plpython function...
>
> Well, we could at leasts depend on the functions explicitly listed at
> the top level and I don't believe we even do that today. I can't think
> of any downside off-hand to that, given that we have create-or-replace
> function.
>

I dunno how much is that worth it TBH, the situations where I've seen
this issue (pglogical has this feature for long time and suffers from
the same lack of dependency tracking) is that somebody drops table/type
used in a function that is used as filter.

>>>>> In short, I'm not sure that I agree with the idea that we shouldn't
>>>>> allow this and instead I'd rather we realize it and put the logical
>>>>> replication into some kind of an error state that requires a resync.
>>>>
>>>> That would still mean a need to resync the data to recover, so I'm not
>>>> sure it's really an improvement. And I suppose it'd require tracking the
>>>> dependencies, because how else would you mark the subscription as
>>>> requiring a resync? At which point we could decline the DROP without a
>>>> CASCADE, just like we do elsewhere, no?
>>>
>>> I was actually thinking more along the lines of just simply marking the
>>> publication/subscription as being in a 'failed' state when a failure
>>> actually happens, and maybe even at that point basically throwing away
>>> everything except the shell of the publication/subscription (so the user
>>> can see that it failed and come in and properly drop it); I'm thinking
>>> about this as perhaps similar to a transaction being aborted.
>>
>> There are several problems with that. First this happens in historic
>> snapshot which can't write and on top of that we are in the middle of
>> error processing so we have our hands tied a bit, it's definitely going
>> to need bit of creative thinking to do this.
>
> We can't write to things inside the database in a historic snapshot and
> we do have to deal with the fact that we're in error processing. What
> about writing somewhere that's outside of the regular database system?
> Maybe a pg_logical/failed directory? There's all the usual
> complications from that around dealing with durable writes (if we need
> to worry about that and I'm not sure that we do... if we fail to
> persist a write saying "X failed" and we restart.. well, it's gonna fail
> again and we write it then), and cleaning things up as needed (but maybe
> this is handled as part of the DROP, and we WAL that, so we can re-do
> the removal of the failed marker file...), and if we need to think about
> what should happen on replicas (is there anything?).

That sounds pretty reasonable. Given that this is corner-case user error
we could perhaps do extra work to ensure things are fsynced even if it's
all not too fast...

>
>> Second, and that's more soft issue (which is probably harder to solve)
>> what do we do with the slot and subscription. There is one failed
>> publication, but the subscription may be subscribed to 20 of them, do we
>> kill the whole subscription because of single failed publication? If we
>> don't do we continue replicating like nothing has happened but with data
>> in the failed publication missing (which can be considered data
>> loss/corruption from the view of user). If we stop replication, do we
>> clean the slot so that we don't keep back wal/catalog xmin forever
>> (which could lead to server stopping) or do we keep the slot so that
>> user can somehow fix the issue (reconfigure subscription to not care
>> about that publication for example) and continue replication without
>> further loss?
>
> I would think we'd have to fail the whole publication if there's a
> failure for any part of it. Replicating a partial set definitely sounds
> wrong to me. Once we stop replication, yes, we should clean the slot
> and mark it failed so that we don't keep back WAL and so that we allow
> the catalog xmin to move forward so that the failed publication doesn't
> run the server out of disk space.
>

I agree that continuing replication where some part of publication is
broken seems wrong and that we should stop replication at that point.

> If we really think there's a use-case for keeping the replication slot

It's not so much about use-case as it is about complete change of
behavior - there is no current error where we remove existing slot.
The use case for keeping slot is a) investigation of the issue, b) just
skipping the broken part of stream by advancing origin on subscription
and continuing replication, with some luck that can mean only single
table needs resyncing, which is better than rebuilding everything.

I think some kind of automated slot cleanup is desirable, but likely
separate feature that should be designed based on amount of outstanding
wal or something.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-12-28 00:14:05 Re: Offline enabling/disabling of data checksums
Previous Message Michael Paquier 2018-12-27 23:32:31 Re: [HACKERS] REINDEX CONCURRENTLY 2.0