On Wed, 2011-07-27 at 22:51 -0400, Robert Haas wrote:
> On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > I wonder whether we could do something involving WAL properties --- the
> > current tuple visibility logic was designed before WAL existed, so it's
> > not exploiting that resource at all. I'm imagining that the kernel of a
> > snapshot is just a WAL position, ie the end of WAL as of the time you
> > take the snapshot (easy to get in O(1) time). Visibility tests then
> > reduce to "did this transaction commit with a WAL record located before
> > the specified position?". You'd need some index datastructure that made
> > it reasonably cheap to find out the commit locations of recently
> > committed transactions, where "recent" means "back to recentGlobalXmin".
> > That seems possibly do-able, though I don't have a concrete design in
> > mind.
> I was mulling this idea over some more (the same ideas keep floating
> back to the top...). I don't think an LSN can actually work, because
> there's no guarantee that the order in which the WAL records are
> emitted is the same order in which the effects of the transactions
> become visible to new snapshots. For example:
> 1. Transaction A inserts its commit record, flushes WAL, and begins
> waiting for sync rep.
> 2. A moment later, transaction B sets synchronous_commit=off, inserts
> its commit record, requests a background WAL flush, and removes itself
> from the ProcArray.
> 3. Transaction C takes a snapshot.
It is Transaction A here which is acting badly - it should also remove
itself from procArray right after it inserts its commit record, as for
everybody else except the client app of transaction A it is committed at
this point. It just cant report back to client before getting
confirmation that it is actually syncrepped (or locally written to
At least at the point of consistent snapshots the right sequence should
1) inert commit record into wal
2) remove yourself from ProcArray (or use some other means to declare
that your transaction is no longer running)
3) if so configured, wait for WAL flus to stable storage and/or SYnc Rep
Based on this let me suggest a simple snapshot cache mechanism
A simple snapshot cache mechanism
have an array of running transactions, with one slot per backend
there are exactly 3 operations on this array
1. insert backends running transaction id
this is done at the moment of acquiring your transaction id from system,
and synchronized by the same mechanism as getting the transaction id
running_transactions[my_backend] = current_transaction_id
2. remove backends running transaction id
this is done at the moment of committing or aborting the transaction,
again synchronized by the write commit record mechanism.
running_transactions[my_backend] = NULL
should be first thing after insertin WAcommit record
3. getting a snapshot
memcpy() running_transactions to local memory, then construct a snapshot
it may be that you need to protect all3 operations with a single
spinlock, if so then I'd propose the same spinlock used when getting
your transaction id (and placing the array near where latest transaction
id is stored so they share cache line).
But it is also possible, that you can get logically consistent snapshots
by protecting only some ops. for example, if you protect only insert and
get snapshot, then the worst that can happen is that you get a snapshot
that is a few commits older than what youd get with full locking and it
may well be ok for all real uses.
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/
In response to
pgsql-hackers by date
|Next:||From: Hannu Krosing||Date: 2011-07-28 16:08:18|
|Subject: Re: cheaper snapshots|
|Previous:||From: Tom Lane||Date: 2011-07-28 15:57:18|
|Subject: Re: cheaper snapshots |