Re: Detecting skipped data from logical slots (data silently skipped)

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Subject: Re: Detecting skipped data from logical slots (data silently skipped)
Date: 2016-08-08 02:59:20
Message-ID: CAMsr+YFiEWw1qZg8rxXnhHGZsWiTxajDZHPhBKiLmU-q+t=EmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5 August 2016 at 14:07, Andres Freund <andres(at)anarazel(dot)de> wrote:

>
> > > The simplest fix would be to require downstreams to flush their
> replication
> > > origin when they get a hot standby feedback message, before they send a
> > > reply with confirmation. That could be somewhat painful for
> performance, but
> > > can be alleviated somewhat by waiting for the downstream postgres to
> get
> > > around to doing a flush anyway and only forcing it if we're getting
> close to
> > > the walsender timeout. That's pretty much what BDR and pglogical do
> when
> > > applying transactions to avoid having to do a disk flush for each and
> every
> > > applied xact. Then we change START_REPLICATION ... LOGICAL so it
> ERRORs if
> > > you ask for a too-old LSN rather than silently ignoring it.
> >
> > That's basically just proposing to revert this broken optimization,
> > IIUC, and instead just try not to flush too often on the standby.
>
> The effect of the optimization is *massive* if you are replicating a
> less active database, or a less active subset of a database, in a
> cluster with lots of other activity. I don't think that can just be
> disregard, to protect against something with plenty of other failure
> scenarios.
>

Right. Though if we flush lazily I'm surprised the effect is that big,
you're the one who did the work and knows the significance of it.

All I'm trying to say is that I think the current behaviour is too
dangerous. It doesn't just lead to failure, but easy, undetectable, silent
failure when users perform common and simple tasks like starting a snapshot
or filesystem-level pg_start_backup() copy of a DB. The only reason it
can't happen for pg_basebackup too is that we omit slots during
pg_basebackup . That inconsistency between snapshot/fs-level and
pg_basebackup is unfortunate but understandable.

So I'm not saying "this whole idea must go". I'm saying I think it's to
permissive and needs to be able to be stricter about what it's allowed to
skip, so we can differentiate between "nothing interesting here" and "um, I
think someone else consumed data I needed, I'd better bail out now". I've
never been comfortable with the skipping behaviour and found it confusing
right from the start, but now I have definite cases it can cause silent
inconsistencies and really think it needs to be modified.

Robert's point that we could keep track of the skippable range is IMO a
good one. An extra slot attribute with the last LSN that resulted in the
output plugin doing a write to the client would be sufficient, at least at
this point. To anticipate future needs where we might want to allow output
plugins to ignore some things, control could be handed to the output plugin
by allowing it to also make a function call for the position to be
explicitly advanced even if it performs no writes.

That way we can safely skip ahead if the client asks us for an LSN equal to
or after the last real data we sent them, but refuse to skip if we sent
them data after the LSN they're asking for.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2016-08-08 03:15:56 Re: garbage in xml regress test
Previous Message Noah Misch 2016-08-08 01:50:40 Re: No longer possible to query catalogs for index capabilities?