Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Logical decoding slots can go backwards when used from SQL, docs are wrong
Date: 2016-09-05 02:41:05
Message-ID: CAMsr+YFwNqzA_CYPoV2tXySi=Nh9fAm_hc=5ZOdRKxGm6H2PXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2 September 2016 at 17:49, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> So the main change becomes the one-liner in my prior mail.

Per feedback from Simon, updated with a new test in src/test/recovery .

If you revert the change to
src/backend/replication/logical/logicalfuncs.c then the test will
start failing.

I'd quite like to backpatch the fix since it's simple and safe, but I
don't feel that it's hugely important to do so. This is an annoyance
not a serious data risking bug.

My only concern here is that we still lose position on crash. So
SQL-interface callers should still be able to cope with it. But they
won't see it happen if they're only testing with normal shutdowns,
host reboots, etc. In practice users aren't likely to notice this
anyway, though, since most people don't restart the server all the
time. I think it's better than what we have.

This issue could be eliminated completely by calling
ReplicationSlotSave() after ReplicationSlotMarkDirty(). This would
force an immediate flush after a non-peeked SQL interface decoding
call. It means more fsync()ing, though, and the SQL interface can only
be used by polling so that could be a significant performance hit.
We'd want to only flush when the position changed, but even then it'll
mean a sync every time anything gets returned.

The better alternative is to add a variant on
pg_logical_slot_get_changes(...) etc that accepts a start LSN. But
it's not convenient or easy for SQL callers to extract the last commit
LSN received from the last call to pass it to the next one, so this
isn't simple, and is probably best tackled as part of the SQL
interface API change Petr and Andres discussed in this thread.

I'm inclined to just dirty the slot for now, then provide a better
solution by adding the peek/confirm interface discussed upthread down
the track.

Andres, Petr, got an opinion here?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-09-05 02:47:24 Re: Logical decoding slots can go backwards when used from SQL, docs are wrong
Previous Message Mithun Cy 2016-09-05 02:29:02 Re: Patch: Implement failover on libpq connect level.