Re: Hot standby, race condition between recovery snapshot and commit

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot standby, race condition between recovery snapshot and commit
Date: 2009-11-15 19:37:59
Message-ID: 4B005897.8040507@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> Simon Riggs wrote:
>> Have you forgotten that
>> discussion so completely that you can't even remember the existence of
>> other options?
>
> I do remember that. I've been thinking about the looser approach a lot
> since yesterday.
>
> So, if we drop the notion that the running-xacts record represents the
> situation at the exact moment it appears in WAL, what do we have to
> change? Creating the running-xacts snapshot becomes easier, but when we
> replay it, we must take the snapshot with a grain of salt.
>
> 1. the snapshot can contain xids that have already finished (= we've
> already seen the commit/abort record)
> 2. the snapshot can lack xids belonging to transactions that have just
> started, between the window when the running-xacts snapshot is taken in
> the master and it's written to WAL.
>
> Problem 1 is quite easy to handle: just check every xid in clog. If it's
> marked there as finished already, it can be ignored.
>
> For problem 2, if a transaction hasn't written any WAL yet, we might as
> well treat it as not-yet-started in the standby, so we're concerned
> about transactions that have written a WAL record between when the
> running-xacts snapshot was taken and written to WAL. Assuming the
> snapshot was taken after the REDO pointer of the checkpoint record, the
> standby has seen the WAL record and therefore has all the information it
> needs. Currently, the standby doesn't add xids to known-assigned list
> until it sees the running-xacts record, but we could change that.

Ok, I tried out that approach. Attached is a complete patch against CVS
HEAD (see commit db15148b930 in the git branch for the diff against the
old approach):

- We start tracking transactions in the known-assigned hash table
immediately from the start of WAL replay. We have to do that because the
running-xacts record we will eventually see lack XIDs belonging to
transactions that started between when the running-xacts snapshot was
taken and written to WAL. If we start tracking at the running-xacts
record, we will miss them. To keep the size of the known-assigned table
bounded, we ignore any XIDs smaller than the oldest XID present in the
running-xacts record (any such transaction must've finished before the
running-xacts record, so we're not interested in them). We wouldn't know
the oldest running XID until we see the running-xacts record, so we
store it in the checkpoint record too, which we have access to right
from the start.

- StartupCLOG/SUBTRANS/MultiXact are now called at the beginning of WAL
replay. We used to delay that until we saw the running-xacts record, but
that always felt a bit weird to me. StartupSUBTRANS takes the
oldest-running-xid as argument, but now that we store that in the
checkpoint record, that's not a problem.

- Because the running-xacts record can contain XIDs belonging to
transactions that finished before the record was written to WAL, we
ignore any already-finished XIDs when it's replayed.

- The running-xacts record is written to WAL before the checkpoint
record. That guarantees that WAL replay will see it.

- RecoveryInfoLock is no longer needed.

This also lays the foundation to allow standby mode even with subxid or
lock overflows. We could now emit separate log records for overflowed
subxids or locks before the running-xacts record to fill that gap.

Am I missing anything?

I also experimented with including the running-xacts information in the
checkpoint record itself. That somehow feels more straightforward to me,
but it wasn't really any less code, and it wouldn't allow us to do the
running-xacts snapshot as multiple WAL records, so the current approach
with separate running-xacts record is better.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
hot-standby-20091115.patch.gz application/x-gzip 75.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2009-11-15 19:38:25 Re: Summary and Plan for Hot Standby
Previous Message Greg Stark 2009-11-15 19:35:10 Re: named parameters in SQL functions