Skip site navigation (1) Skip section navigation (2)

Re: cheaper snapshots

From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: cheaper snapshots
Date: 2011-07-28 14:17:09
Message-ID: 1311862629.3117.1362.camel@hvost (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On Thu, 2011-07-28 at 09:38 -0400, Robert Haas wrote:
> On Thu, Jul 28, 2011 at 6:50 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> > On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> > I wonder whether we could do something involving WAL properties --- the
> >> > current tuple visibility logic was designed before WAL existed, so it's
> >> > not exploiting that resource at all.  I'm imagining that the kernel of a
> >> > snapshot is just a WAL position, ie the end of WAL as of the time you
> >> > take the snapshot (easy to get in O(1) time).  Visibility tests then
> >> > reduce to "did this transaction commit with a WAL record located before
> >> > the specified position?".
> >
> > Why not just cache a "reference snapshots" near WAL writer and maybe
> > also save it at some interval in WAL in case you ever need to restore an
> > old snapshot at some val position for things like time travel.
> >
> > It may be cheaper lock-wise not to update ref. snapshot at each commit,
> > but to keep latest saved snapshot and a chain of transactions
> > committed / aborted since. This means that when reading the snapshot you
> > read the current "saved snapshot" and then apply the list of commits.
> Yeah, interesting idea.  I thought about that.  You'd need not only
> the list of commits but also the list of XIDs that had been published,
> since the commits have to be removed from the snapshot and the
> newly-published XIDs have to be added to it (in case they commit later
> while the snapshot is still in use).
> You can imagine doing this with a pair of buffers.  You write a
> snapshot into the beginning of the first buffer and then write each
> XID that is published or commits into the next slot in the array.
> When the buffer is filled up, the next process that wants to publish
> an XID or commit scans through the array and constructs a new snapshot
> that compacts away all the begin/commit pairs and writes it into the
> second buffer, and all new snapshots are taken there.  When that
> buffer fills up you flip back to the first one.  Of course, you need
> some kind of synchronization to make sure that you don't flip back to
> the first buffer while some laggard is still using it to construct a
> snapshot that he started taking before you flipped to the second one,
> but maybe that could be made light-weight enough not to matter.
> I am somewhat concerned that this approach might lead to a lot of
> contention over the snapshot buffers.  

My hope was, that this contention would be the same than simply writing
the WAL buffers currently, and thus largely hidden by the current WAL
writing sync mechanisma. 

It really covers just the part which writes commit records to WAL, as
non-commit WAL records dont participate in snapshot updates.

Writing WAL is already a single point which needs locks or other kind of
synchronization. This will stay with us at least until we start
supporting multiple WAL streams, and even then we will need some
synchronisation between those.

> In particular, the fact that
> you have to touch shared cache lines both to advertise a new XID and
> when it gets committed seems less than ideal. 

Every commit record writer should do this as part of writing the commit
record. And as mostly you still want the latest snapshot, why not just
update the snapshot as part of the commit/abort ?

Do we need the ability for fast "recent snapshots" at all ?

>  One thing that's kind
> of interesting about the "commit sequence number" approach is that -
> as far as I can tell - it doesn't require new XIDs to be advertised
> anywhere at all.  You don't have to worry about overflowing the
> subxids[] array because it goes away altogether.  The commit sequence
> number itself is going to be a contention hotspot, but at least it's
> small and fixed-size.
> Another concern I have with this approach is - how large do you make
> the buffers?  If you make them too small, then you're going to have to
> regenerate the snapshot frequently, which will lead to the same sort
> of lock contention we have today - no one can commit while the
> snapshot is being regenerated.  On the other hand, if you make them
> too big, then deriving a snapshot gets slow.  Maybe there's some way
> to make it work, but I'm afraid it might end up being yet another
> arcane thing the tuning of which will become a black art among
> hackers...

Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book:

In response to


pgsql-hackers by date

Next:From: Robert HaasDate: 2011-07-28 14:20:57
Subject: Re: New partitioning WAS: Check constraints on partition parents only?
Previous:From: Robert HaasDate: 2011-07-28 14:06:13
Subject: Re: Check constraints on partition parents only?

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group