Tom Lane wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> On 2/26/10 10:53 AM, Tom Lane wrote:
>>> I think that what we are going to have to do before we can ship 9.0
>>> is rip all of that stuff out and replace it with the sort of closed-loop
>>> synchronization Greg Smith is pushing. It will probably be several
>>> months before everyone is forced to accept that, which is why 9.0 is
>>> not going to ship this year.
>> I don't think that publishing visibility info back to the master ... and
>> subsequently burdening the master substantially for each additional
>> slave ... are what most users want.
> I don't see a "substantial additional burden" there. What I would
> imagine is needed is that the slave transmits a single number back
> --- its current oldest xmin --- and the walsender process publishes
> that number as its transaction xmin in its PGPROC entry on the master.
The additional burden comes from the old snapshot effect. It makes it
unusable for offloading reporting queries, for example. In general, it
is a very good architectural property that the master is not affected by
what happens in a standby, and a closed-loop synchronization would break
I don't actually understand how tight synchronization on its own would
solve the problem. What if the connection to the master is lost? Do you
kill all queries in the standby before reconnecting?
One way to think about this is to first consider a simple a stop-and-go
system. Clearly the database must be consistent at any point in the WAL
sequence, if recovery was stopped and the database started up. So it is
always safe to pause recovery and run a read-only query against the
database as it is at that point in time (this assumes that the index
"cleanup" operations are not required for consistent query results BTW).
After the read-only transaction is finished, you can continue recovery.
The next step up is to relax that so that you allow replay of those WAL
records that are known to not cause trouble to the read-only queries.
For example, heap_insert records are very innocent, they only add rows
with a yet-uncommitted xmin.
Things get more complex when you allow the replay of commit records; all
the known-assigned-xids tracking is related to that, so that
transactions that are not committed when a snapshot is taken in the
standby to be considered uncommitted by the snapshot even after the
commit record is later replayed. If that feels too fragile, there might
be other methods to achieve that. One I once pondered is to not track
all in-progress transactions in shared memory like we do now, but only
OldestXmin. When a backend wants to take a snapshot in the slave, it
memcpy()s clog from OldestXmin to the latest committed XID, and includes
it in the snapshot. The visibility checks use the copy instead of the
actual clog, so they see the situation as it was when the snapshot was
taken. To keep track of the OldestXmin in the slave, the master can emit
that as a WAL record every now and then; it's ok if it lags behind.
Then there's the WAL record types that remove data that might still be
required by the read-only transactions. This includes vacuum and index
If you really think the current approach is unworkable, I'd suggest that
we fall back to a stop-and-go system, where you either let the recovery
to progress or allow queries to run, but not both at the same time. But
FWIW I don't think the situation is that grave.
In response to
pgsql-hackers by date
|Next:||From: Greg Stark||Date: 2010-02-26 20:10:11|
|Subject: Re: Hot Standby query cancellation and Streaming Replication integration|
|Previous:||From: Alex Hunsaker||Date: 2010-02-26 20:02:40|
|Subject: Re: Avoiding bad prepared-statement plans.|