Re: Logical decoding slots can go backwards when used from SQL, docs are wrong

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Logical decoding slots can go backwards when used from SQL, docs are wrong
Date: 2016-03-11 10:44:01
Message-ID: CAMsr+YGF0LQ804Tq5iEKNS=cEXvtDxNrEFdUwbgiNO8TnNqzZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11 March 2016 at 17:00, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

>
>
>> Also, pg_logical_slot_get_changes and its _peek_ variant should have a
>> param specifying the starting LSN to read and return. If this is lower than
>> the restart_lsn but non-null it should ERROR; if it's greater than or equal
>> it should use this position instead of starting at the confirmed_lsn.
>>
>
> Maybe. I don't really like changing APIs. Can we add new funcs? Make sure
> a NULL LSN can be passed as well.
>

It'd be compatible with the old SQL API and we've already made one such
change in 9.5, to add the parameter to immediately reserve WAL for physical
slots. You just CREATE OR REPLACE FUNCTION it in system_views.sql with a
DEFAULT so the old signature still works.

That works for C-level callers too, unless they try to get clever and
DirectFunctionCall3 it. Even then it might, I don't recall off the top of
my head, but if not we can defend against that too by testing PG_NARGS in
the function its self. This wasn't done for the reserve_wal change and I'm
not convinced it's needed even if the fmgr doesn't take care of the default
parameters - it's not like that's a function that makes much sense to
DirectFunctionCall anyway.

Is the return type of pg_logical_slot_peek_changes() incorrect in the docs?
>

I don't think so, why?

> Time permitting I'd like to add a pg_logical_slot_confirm function, so you
>> can aternate _peek_changes and _confirm, making it possible to get
>> walsender-interface-like behaviour via the SQL interface.
>>
>
> I thought thats what replorigins do.
>

Replication origins provide an easy and efficient way for a PostgreSQL
downstream (client) to track its replay state and position in a reliable
way. If the client is PostgreSQL then you can use replication origins to
track the last flushed LSN, yes. If you're using the walsender interface
you must do so. They are not used by the server-side of logical decoding on
either walsender or SQL interfaces.

If you're using the SQL interface it has its own replay position tracking.
Just before pg_logical_slot_get_changes returns it updates the confirm
position of the slot, as if you'd sent a replay confirmation from the
client on the walsender protocol. The problem is that this happens before
we know the client has received and flushed all the data we just sent it.
The client might not get all (or any) of it if there are network issues,
for example.

Because the replay position can go backwards the client can and must track
the last-safely-replayed position its self, as established earlier, so it
can skip over data it already saw. The problem is that when using the SQL
interface the client can't do the reverse - it can't tell the server "send
me that data again, please, I only got half of it last time because the
network died". The server has advanced the confirmed position and there's
no way to ask for the data again over the SQL interface.

That's why I want to add a param to let it do so, replaying from prior to
the confirm location or skipping past data that'd otherwise be replayed. If
you pass NULL (the default, and the only option in 9.5 or older) for that
param then you get data from the server's last recorded confirm location,
which is the server's best-effort tracking of your replay position from
server-side.

That's also why a pg_logical_slot_confirm function could be handy. It lets
you do a _peek_changes then a _confirm when you know you've flushed them,
prior to the next peek.

I think the docs for pg_replication_slots are wrong about the confirm
location btw, it says data from before the confirm location can't be
replayed, but it can over the walsender protocol. There's just no way to
ask for it over the SQL interface.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2016-03-11 10:45:25 Re: auto_explain sample rate
Previous Message Thomas Munro 2016-03-11 10:30:33 Re: WIP: Detecting SSI conflicts before reporting constraint violations