On Thu, Jul 28, 2011 at 6:50 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> > I wonder whether we could do something involving WAL properties --- the
>> > current tuple visibility logic was designed before WAL existed, so it's
>> > not exploiting that resource at all. I'm imagining that the kernel of a
>> > snapshot is just a WAL position, ie the end of WAL as of the time you
>> > take the snapshot (easy to get in O(1) time). Visibility tests then
>> > reduce to "did this transaction commit with a WAL record located before
>> > the specified position?".
> Why not just cache a "reference snapshots" near WAL writer and maybe
> also save it at some interval in WAL in case you ever need to restore an
> old snapshot at some val position for things like time travel.
> It may be cheaper lock-wise not to update ref. snapshot at each commit,
> but to keep latest saved snapshot and a chain of transactions
> committed / aborted since. This means that when reading the snapshot you
> read the current "saved snapshot" and then apply the list of commits.
Yeah, interesting idea. I thought about that. You'd need not only
the list of commits but also the list of XIDs that had been published,
since the commits have to be removed from the snapshot and the
newly-published XIDs have to be added to it (in case they commit later
while the snapshot is still in use).
You can imagine doing this with a pair of buffers. You write a
snapshot into the beginning of the first buffer and then write each
XID that is published or commits into the next slot in the array.
When the buffer is filled up, the next process that wants to publish
an XID or commit scans through the array and constructs a new snapshot
that compacts away all the begin/commit pairs and writes it into the
second buffer, and all new snapshots are taken there. When that
buffer fills up you flip back to the first one. Of course, you need
some kind of synchronization to make sure that you don't flip back to
the first buffer while some laggard is still using it to construct a
snapshot that he started taking before you flipped to the second one,
but maybe that could be made light-weight enough not to matter.
I am somewhat concerned that this approach might lead to a lot of
contention over the snapshot buffers. In particular, the fact that
you have to touch shared cache lines both to advertise a new XID and
when it gets committed seems less than ideal. One thing that's kind
of interesting about the "commit sequence number" approach is that -
as far as I can tell - it doesn't require new XIDs to be advertised
anywhere at all. You don't have to worry about overflowing the
subxids array because it goes away altogether. The commit sequence
number itself is going to be a contention hotspot, but at least it's
small and fixed-size.
Another concern I have with this approach is - how large do you make
the buffers? If you make them too small, then you're going to have to
regenerate the snapshot frequently, which will lead to the same sort
of lock contention we have today - no one can commit while the
snapshot is being regenerated. On the other hand, if you make them
too big, then deriving a snapshot gets slow. Maybe there's some way
to make it work, but I'm afraid it might end up being yet another
arcane thing the tuning of which will become a black art among
The Enterprise PostgreSQL Company
In response to
pgsql-hackers by date
|Next:||From: Nikhil Sontakke||Date: 2011-07-28 13:43:50|
|Subject: Re: Check constraints on partition parents only?|
|Previous:||From: Robert Haas||Date: 2011-07-28 13:05:54|
|Subject: Re: cheaper snapshots|