Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
Date: 2015-04-17 18:02:15
Message-ID: CAM3SWZRks8MzYsOcLMYefpoPgFvvEXqaRmVuKkZA9jkHoR-8jw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 17, 2015 at 1:38 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> Do you just mean that you think that speculative insertions should be
>> explicitly affirmed by a second record (making it not a speculative
>> tuple, but rather, a fully fledged tuple)? IOW, an MVCC snapshot has
>> no chance of seeing a tuple until it was affirmed by a second in-place
>> modification, regardless of tuple xmin xact commit status?
>
> Yes. I think

Good.

> a) HEAP_SPECULATIVE should never be visible outside in a
> committed transaction. That allows us to redefine what exactly the bit
> means and such after a simple restart. On IM Heiki said he wants to
> replace this by a bit in the item pointer, but I don't think that
> changes things substantially.

I guess you envisage that HEAP_SPECULATIVE is an infomask2 bit? I
haven't been using one of those for a couple of revisions now,
preferring to use the offset MagicOffsetNumber to indicate a
speculative t_ctid value. I'm slightly surprised that Heikki now wants
to use an infomask2 bit (if that is what you mean), because he wanted
to preserve those by doing the MagicOffsetNumber thing. But I guess
we'd still be preserving the bit under this scheme (since it's always
okay to do something different with the bit after a restart).

Why is it useful to consume an infomask2 bit after all? Why did Heikki
change his mind - due to wanting representational redundancy? Or do I
have it all wrong?

> b) t_ctid should not contain a speculative token in committed
> (i.e. visible to other sessions using mvcc semantics) tuple. Right now
> (i.e. master) t_ctid will point to itself for non-updated tuples. I
> don't think it's good to have something in there that's not an actual
> ctid in there. The number of places that look into t_ctid for
> in-progress insertions of other sessions is smaller than the ones that
> look at tuples in general.

Right. So if a tuple is committed, it should not have set
HEAP_SPECULATIVE/have a t_ctid offset of MagicOffsetNumber. But a
non-ctid t_ctid (a speculative token) remains possible for
non-committed tuples visible to some types of snapshots (in
particular, dirty snapshots).

> c) Cleaning the flag/ctid after a completed speculative insertion makes
> it far less likely that we end up waiting on a other backend when we
> wouldn't have to. If a session inserts a large number of tuples
> speculatively (surely *not* a unlikely thing in the long run) it gets
> rather likely that tokens are reused. Which means if another backend
> touches these in-progress records it's quite possible that the currently
> acquired token is the same as the one on a tuple that has actually
> finished inserting.

It's more important than that, actually. If the inserter fails to
clear its tuple's speculative token, and then releases their token
lmgr lock, it will cause livelock with many upserting sessions.
Coordinating which other session gets to lazily clear the speculative
token (cleaning up after the original inserter) seemed quite hazardous
when I looked into it. Maybe you could fix it by interleaving buffer
locks and lmgr locks, but we can't do that.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2015-04-17 18:18:08 Re: Replication identifiers, take 4
Previous Message Simon Riggs 2015-04-17 17:36:48 Re: Replication identifiers, take 4