Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Logical decoding slots can go backwards when used from SQL, docs are wrong
Date: 2016-09-02 07:58:33
Message-ID: CAMsr+YGGtHxpRx1v=_FOrW7hHdw=udr=BVcH=ypA8gfMZMZ28A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1 September 2016 at 21:19, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> I agree the doc patch should go in, though I suggest reword it
> slightly, like attached patch.

Thanks. The rewording looks good to me and I think that
Doc-correction-each-change-once.v2.patch is ready for commit.

Meanwhile, thinking about the patch to dirty slots on
confirmed_flush_lsn advance some more, I don't think it's ideal to do
it in LogicalConfirmReceivedLocation(). I'd rather not add more
complexity there, it's already complex enough. Doing it there will
also lead to unnecessary slot write-out being done to slots at normal
checkpoints even if the slot hasn't otherwise changed, possibly in
response to lsn advances sent in response to keepalives due to
activity on other unrelated databases. Slots are always fsync()ed when
written out so we don't want to do it more than we have to.

We really only need to write out slots where only the
confirmed_flush_lsn has changed at a shutdown checkpoint since it's
not really a big worry if it goes backwards on crash, and otherwise it
can't. Even then it only _really_ matters when the SQL interface is
used. Losing the confirmed_flush_lsn is very annoying when using
pg_recvlogical too, and was the original motivation for this patch.
But I'm thinking of instead teaching pg_recvlogical to write out a
status file with its last confirmed point on exit and to be able to
take that as an argument when (re)connecting. Poor-man's replication
origins, effectively.

So here's a simpler patch that just dirties the slot when it's
replayed something from it on the SQL interface, so it's flushed out
next checkpoint or on shutdown. That's the main user visible defect
that should be fixed and it's trivial to do here. It means we'll still
forget the confirmed_flush_lsn on clean shutdown if it was advanced
using the walsender protocol, but *shrug*. That's just a little
inconvenient. I can patch pg_recvlogical separately.

The alternative is probably to add a second, "softer" dirty tracking
method that only causes a write at a clean shutdown or forced
checkpoint - and maybe doesn't bother fsync()ing. That's a bit more
invasive but would work for walsender use as well as the SQL
interface. I don't think it's worth the bother, since in the end
callers have to be prepared for repeated data on crash anyway.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Dirty-replication-slots-when-accessed-via-sql-interf.patch text/x-patch 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2016-09-02 08:13:18 Re: Surprising behaviour of \set AUTOCOMMIT ON
Previous Message Rahila Syed 2016-09-02 07:42:37 Re: Surprising behaviour of \set AUTOCOMMIT ON