Re: Transaction Snapshots and Hot Standby

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Transaction Snapshots and Hot Standby
Date: 2008-09-11 14:58:36
Message-ID: 48C9321C.4090007@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> So part of the handshake between
> primary and standby must be "what is your recentxmin?". The primary will
> then use the lower/earliest of the two.

Even then, the master might already have vacuumed away tuples that are
visible to an already running transaction in the slave, before the slave
connects. Presumably the master doesn't wait for the slave to connect
before starting to accept new connections.

>> As you mentioned, the options there are to defer applying WAL, or cancel
>> queries. I think both options need the same ability to detect when
>> you're about to remove a tuple that's still visible to some snapshot,
>> just the action is different. We should probably provide a GUC to
>> control which you want.
>
> I don't see any practical way of telling whether a tuple removal will
> affect a snapshot or not. Each removed row would need to be checked
> against each standby snapshot. Even if those were available, it would be
> too costly.

How about using the same method as we use in HeapTupleSatisfiesVacuum?
Before replaying a vacuum record, look at the xmax of the tuple
(assuming it committed). If it's < slave's OldestXmin, it can be
removed. Otherwise not. Like HeapTupleSatisfiesVacuum, it's
conservative, but doesn't require any extra bookkeeping.

And vice versa: if we implement the more precise book-keeping, with all
snapshots in shared memory or something, we might as well use it in
HeapTupleSatisfiesVacuum. That has been discussed before, but it's a
separate project.

> It was also suggested we might take the removed rows and put them in a
> side table, but that makes me think of the earlier ideas for HOT and so
> I've steered clear of that.

Yeah, that's non-trivial. Basically a whole new, different
implementation of MVCC, but without changing any on-disk formats.

BTW, we haven't talked about how to acquire a snapshot in the slave.
You'll somehow need to know which transactions have not yet committed,
but will in the future. In the master, we keep track of in-progress
transaction in the ProcArray, so I suppose we'll need to do the same in
the slave. Very similar to prepared transactions, actually. I believe
the Abort records, which are not actually needed for normal operation,
become critical here. The slave will need to put an entry to ProcArray
for any new XLogRecord.xl_xid it sees in the WAL, and remove the entry
at a Commit and Abort record. And clear them all at a shutdown record.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2008-09-11 15:03:55 Re: Synchronous Log Shipping Replication
Previous Message Alvaro Herrera 2008-09-11 14:49:45 Re: Move src/tools/backend/ to wiki