From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Date: 2015-04-17 08:38:56
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-04-16 09:43:54 -0700, Peter Geoghegan wrote:
> On Thu, Apr 16, 2015 at 2:23 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I'm, completely independent of logical decoding, of the *VERY* strong
> > opinion that 'speculative insertions' should never be visible when
> > looking with normal snapshots. For one it allows to simplify
> > considerations around wraparound (which has proven to be a very good
> > idea, c.f. multixacts + vacuum causing data corruption years after it
> > was thought to be harmless). For another it allows to reclaim/redefine
> > the bit after a database restart/upgrade. Given how complex this is and
> > how scarce flags are that seems like a really good property.
> >
> > And avoiding those flags to be visible to the outside requires a WAL
> > record, otherwise it won't be correct on the standby.
> I'm a bit distracted here, and not sure exactly what you mean. What's
> a normal snapshot?

Normal visibility semantics, i.e. SnapshotMVCC, not SnapshotDirty.

> Do you just mean that you think that speculative insertions should be
> explicitly affirmed by a second record (making it not a speculative
> tuple, but rather, a fully fledged tuple)? IOW, an MVCC snapshot has
> no chance of seeing a tuple until it was affirmed by a second in-place
> modification, regardless of tuple xmin xact commit status?

Yes. I think

a) HEAP_SPECULATIVE should never be visible outside in a
committed transaction. That allows us to redefine what exactly the bit
means and such after a simple restart. On IM Heiki said he wants to
replace this by a bit in the item pointer, but I don't think that
changes things substantially.

b) t_ctid should not contain a speculative token in committed
(i.e. visible to other sessions using mvcc semantics) tuple. Right now
(i.e. master) t_ctid will point to itself for non-updated tuples. I
don't think it's good to have something in there that's not an actual
ctid in there. The number of places that look into t_ctid for
in-progress insertions of other sessions is smaller than the ones that
look at tuples in general.

c) Cleaning the flag/ctid after a completed speculative insertion makes
it far less likely that we end up waiting on a other backend when we
wouldn't have to. If a session inserts a large number of tuples
speculatively (surely *not* a unlikely thing in the long run) it gets
rather likely that tokens are reused. Which means if another backend
touches these in-progress records it's quite possible that the currently
acquired token is the same as the one on a tuple that has actually
finished inserting.


Andres Freund

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-04-17 08:54:51 Re: Replication identifiers, take 4
Previous Message Andres Freund 2015-04-17 08:30:56 Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0