Re: Physical replication slot advance is not persistent

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Physical replication slot advance is not persistent
Date: 2020-01-21 03:05:57
Message-ID: 20200121030557.GC2552@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 21, 2020 at 09:44:12AM +0800, Craig Ringer wrote:
> PLEASE do not make the streaming replication interface force flushes!

Yeah, that's a bad idea. FWIW, my understanding is that this has been
only proposed in v3, and this has been discarded:
https://www.postgresql.org/message-id/175c2760666a78205e053207794c0f8f@postgrespro.ru

> The replication interface should not immediately flush changes to the
> slot replay position on advance. It should be marked dirty and left to
> be flushed by the next checkpoint. Doing otherwise potentially
> introduces a lot of unnecessary fsync()s and may have an unpleasant
> impact on performance.

Some portions of the advancing code tells a different story. It seems
to me that the intention behind the first implementation of slot
advancing was to get things flushed if any advancing was done. The
check doing that is actually broken from the start, but that's another
story. Could you check with Petr what was the intention here or drag
his attention to this thread? He is the original author of the
feature. So his output would be nice to have.

> Clients of the replication protocol interface should be doing their
> own position tracking on the client side. They should not ever be
> relying on the server side restart position for correctness, since it
> can go backwards on crash and restart. Any that do rely on it are
> incorrect. I should propose a docs change that explains how the server
> and client restart position tracking interacts on both phy and logical
> rep since it's not really covered right now and naïve client
> implementations will be wrong.
>
> I don't really care if the SQL interface forces an immediate flush
> since it's never going to have good performance anyway.

Okay, the flush could be optional as well, but that's a different
discussion. The docs of logical decoding mention that slot data may
go backwards in the event of a crash. If you have improvements for
that, surely that's welcome.

> The SQL interface advances the slot restart position and marks the
> slot dirty *before the client has confirmed receipt of the data and
> flushed it to disk*. So there's a data loss window. If the client
> disconnects or crashes before all the data from that function call is
> safely flushed to disk it may lose the data, then be unable to fetch
> it again from the server because of the restart_lsn position advance.

Well, you have the same class of problems with for example synchronous
replication. The only state a client can be sure of is that it
received a confirmation that the operation happens and completed.
Any other state tells that the operation may have happened. Or not.
Now, being sure that the data of the new slot has been flushed once
the advancing function is done once the client got the confirmation
that the work is done is a property which could be interesting to some
class of applications.

> Really, we should add a "no_advance_position" option to the SQL
> interface, then expect the client to call a second function that
> explicitly advances the restart_lsn and confirmed_flush_lsn. Otherwise
> no SQL interface client can be crashsafe.

Hm. Could you elaborate more what you mean here? I am not sure to
understand. Note that calling the advance function multiple times has
no effects, and the result returned is the actual restart_lsn (or
confirmed_flush_lsn of course).
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-01-21 03:15:14 Re: PATCH: standby crashed when replay block which truncated in standby but failed to truncate in master node
Previous Message Andreas Joseph Krogh 2020-01-21 03:05:17 Re: Minimal logical decoding on standbys