Re: Reporting the commit LSN at commit time

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reporting the commit LSN at commit time
Date: 2014-08-08 03:22:17
Message-ID: 53E44269.10809@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/08/2014 09:51 AM, Tom Lane wrote:

>> AFAIK we don't _have_ a fancy negotiation system in the protocol, with
>> back-and-forth exchanges of capabilities information.
>
> Maybe it's time to invent that. It would be positively foolish to
> create any such behavior without a protocol version bump anyway.

I was hoping it'd be easier to sneak a new message type in without a
full protocol bump. As you can imagine that makes it a ... rather larger
job.

Still, if it's unsafe to do it that way...

> Although as I said, I don't think embedding knowledge of LSNs at the
> protocol level is a good thing to begin with.

As I said upthread, it need not necessarily be an LSN. A commit
timestamp would do the job too, if information about the last-replayed
commit timestamp was accessible on the downstream node.

It needs to be a sequence identifier that can be matched against
pg_replication_slots / pg_stat_replication or passed to a function on
the downstream end to say "wait until we're replayed to this point".

For streaming replication there's only one upstream, so there's no need
to identify it. For BDR you'd also have to identify the upstream node of
interest - probably by slot ID, or by (sysid, tlid, dboid) tuple.

In the end, it can be an opaque magic cookie. It really doesn't matter,
so long as what the client receives is a value it can pass to another Pg
instance and say "wait until you've replayed up to this, please" or
"have you replayed up to this yet?".

> Is it really necessary that this information be pushed to the client
on every commit, as opposed to the client asking for it occasionally?

I think so, yes, though I'd be glad to be proved wrong.

For the purpose of transparent failover (BDR) at least, the server
currently being written to can go away at any moment, and you should
know exactly what you're up to in order to make it safe to continue on
another server.

Consider, for a multi-master configuration where two servers replicate
to each other:

On a connection to server1:

INSERT INTO bird(id, parrot)
VALUES (1, 'African Grey');

[client grabs magic cookie for server replay state]

INSERT INTO bird(id, parrot)
VALUES (2, 'Lorikkeet');

[server1 sends changes to server2, which is behind on replay
and still working on catching up]

[server1 dies abruptly]

[client drops connection to server1, connects to server2]

-- Correct spelling
UPDATE bird
SET parrot = 'Lorikeet'
WHERE id = 2;

If the INSERT from server1 hasn't replayed on server2 yet this will
fail. Other anomalies can be worse and cause lost updates, etc.

To protect against this the client needs a way to wait, after connecting
to server2, until it's caught up with the state of server1. That's what
I'm talking about here. In this case, if you used a periodic progress
indicator requested by the client, you'd still get the same error,
because you'd wait until the first INSERT but not the second.

So yes, the client needs this info at every commit.

That means that enabling client-side fail-over won't be free, especially
for lots of small transactions. It'll be cheaper if Pg can push the info
with the commit confirmation instead of the client having to request it
afterwards though.

(Note that the client risks waiting forever if server1 didn't send the
required commits before it died, but that's where application policy
decisions come in).

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2014-08-08 03:24:10 Re: Introducing coarse grain parallelism by postgres_fdw.
Previous Message Tom Lane 2014-08-08 03:17:43 jsonb format is pessimal for toast compression