Re: Global snapshots

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Global snapshots
Date: 2018-05-15 12:53:52
Message-ID: CA+Tgmobcd5u4nj-BXFsMx7PrhscHho1z9926y1Xz_og8V1hi2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 14, 2018 at 7:20 AM, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru> wrote:
> Summarising, I think, that introducing some permanent connections to
> postgres_fdw node will put too much burden on this patch set and that it will
> be possible to address that later (in a long run such connection will be anyway
> needed at least for a deadlock detection). However, if you think that current
> behavior + STO analog isn't good enough, then I'm ready to pursue that track.

I don't think I'd be willing to commit to a particular approach at
this point. I think the STO analog is an interesting idea and worth
more investigation, and I think the idea of a permanent connection
with chatter that can be used to resolve deadlocks, coordinate shared
state, etc. is also an interesting idea. But there are probably lots
of ideas that somebody could come up with in this area that would
sound interesting but ultimately not work out. Also, an awful lot
depends on quality of implementation. If you come up with an
implementation of a permanent connection for coordination "chatter",
and the patch gets rejected, it's almost certainly not a sign that we
don't want that thing in general. It means we don't want yours. :-)

Actually, I think if we're going to pursue that approach, we ought to
back off a bit from thinking about global snapshots and think about
what kind of general mechanism we want. For example, maybe you can
imagine it like a message bus, where there are a bunch of named
channels on which the server publishes messages and you can listen to
the ones you care about. There could, for example, be a channel that
publishes the new system-wide globalxmin every time it changes, and
another channel that publishes the wait graph every time the deadlock
detector runs, and so on. In fact, perhaps we should consider
implementing it using the existing LISTEN/NOTIFY framework: have a
bunch of channels that are predefined by PostgreSQL itself, and set
things up so that the server automatically begins publishing to those
channels as soon as anybody starts listening to them. I have to
imagine that if we had a good mechanism for this, we'd get all sorts
of proposals for things to publish. As long as they don't impose
overhead when nobody's listening, we should be able to be fairly
accommodating of such requests.

Or maybe that model is too limiting, either because we don't want to
broadcast to everyone but rather send specific messages to specific
connections, or else because we need a request-and-response mechanism
rather than what is in some sense a one-way communication channel.
Regardless, we should start by coming up with the right model for the
protocol first, bearing in mind how it's going to be used and other
things for which somebody might want to use it (deadlock detection,
failover, leader election), and then implement whatever we need for
global snapshots on top of it. I don't think that writing the code
here is going to be hugely difficult, but coming up with a good design
is going to require some thought and discussion.

And, for that matter, I think the same thing is true for global
snapshots. The coding is a lot harder for that than it is for some
new subprotocol, I'd imagine, but it's still easier than coming up
with a good design. As far as I can see, and everybody can decide for
themselves how far they think that is, the proposal you're making now
sounds like a significant improvement over the XTM proposal. In
particular, the provisioning and deprovisioning issues sound like they
have been thought through a lot more. I'm happy to call that
progress. At the same time, progress on a journey is not synonymous
with arrival at the destination, and I guess it seems to me that you
have some further research to do along the lines you've described:

1. Can we hold back xmin only when necessary and to the extent
necessary instead of all the time?
2. Can we use something like an STO analog, maybe as an optional
feature, rather than actually holding back xmin?

And I'd add:

3. Is there another approach altogether that doesn't rely on holding
back xmin at all?

For example, if you constructed the happens-after graph between
transactions in shared memory, including actions on all nodes, and
looked for cycles, you could abort transactions that would complete a
cycle. (We say A happens-after B if A reads or writes data previously
written by B.) If no cycle exists then all is well. I'm pretty sure
it's been well-established that a naive implementation of this
algorithm is terribly unperformant, but for example SSI works on this
principle. It reduces the bookkeeping involved by being willing to
abort transactions that aren't really creating a cycle if they look
like they *might* create a cycle. Now that's an implementation *on
top of* snapshots for the purpose of getting true serializability
rather than a way of getting global snapshots per se, so it's not
suitable for what you're trying do here, but I think it shows that
algorithms based on cycle detection can be made practical in some
cases, and so maybe this is another such case. On the other hand,
this whole line of thinking could also be a dead end...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2018-05-15 13:36:36 Re: libpq compression
Previous Message Konstantin Knizhnik 2018-05-15 12:53:36 Re: libpq compression