Quick Links

Re: SSI and Hot Standby

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Florian Pflug" <fgp(at)phlo(dot)org>,<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI and Hot Standby
Date:	2011-01-22 00:52:28
Message-ID:	4D39D5EC0200002500039A19@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I wrote:

> We're not talking about passing the backwards. I'm suggesting
> that we probably don't even need to pass them forward, but that
> suggestion has been pretty handwavy so far. I guess I should fill
> it out, because everyone's been ignoring it so far.

It's been too hectic today to flesh this out very well, but I can at
least do a better brain dump -- you know, wave my hands a little
less vaguely.

The idea of communicating regarding a safe snapshot through the WAL
without actually *sending* snapshot XIDs through the WAL might work
something like this:

(1) We communicate when we are starting to consider a snapshot.
This would always be related to the commit or rollback of a
serializable read-write transaction, so perhaps we could include the
information in an existing WAL record. We would need to find one
free bit somewhere, or make room for it. Alternatively, we could
send a new WAL record type to communicate this. At the point that a
standby processes such a WAL record, it would grab a snapshot effect
after the commit, and save it as the "latest candidate", releasing
the previous candidate, if any.

(2) If a snapshot fails to make it to a safe status on the master,
it will pick a new candidate and repeat (1) -- there's no need to
explicitly quash a failed candidate.

(3) We communicate when we find that the last candidate made it to
"safe" status. Again, this would be related to the commit or
rollback of a serializable read-write transaction. Same issues
about needing (another) bit or using a new record type. When a
standby receives this, it promotes the latest candidate to the new
"safe snapshot" to be used when a serializable transaction asks for
a snapshot, replacing the previous value, if any. Any transactions
waiting for a snapshot (either because there previously wasn't a
safe snapshot on record or because they requested DEFERRABLE) could
be provided the new snapshot and turned loose.

(4) It's not inconceivable that we might want to send both (1) and
(3) with the same commit.

(5) Obviously, we can pick our heuristics for how often we try to
refresh this, limiting it to avoid too much overhead, at the cost of
less frequent snapshot updates for serializable transactions on the
standbys.

My assumption is that when we have a safe snapshot (which should be
pretty close to all the time), we immediately provide it to any
serializable transaction requesting a snapshot, except it seems to
make sense to use the new DEFERRABLE mode to mean that you want to
use the *next* one to arrive.

This would effectively cause the point in time which was visible to
serializable transactions to lag behind what is visible to other
transactions by a variable amount, but would ensure that a
serializable transaction couldn't see any serialization anomalies.
It would also be immune to serialization failures from SSI logic;
but obviously, standby-related cancellations would be in play. I
don't know whether the older snapshots would tend to increase the
standby-related cancellations, but it wouldn't surprise me.

Hopefully this is enough for people to make something of it.

-Kevin

In response to

Re: SSI and Hot Standby at 2011-01-21 16:16:21 from Kevin Grittner

Responses

Re: SSI and Hot Standby at 2011-01-22 01:32:18 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Davis	2011-01-22 01:32:18	Re: SSI and Hot Standby
Previous Message	Kevin Grittner	2011-01-22 00:08:16	Re: READ ONLY fixes