Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
Date: 2015-02-16 00:44:51
Message-ID: CAM3SWZQb4uEsa5JhAB-nKfm=_9DJGai4YE1Gpi5nY1Tv1ooDuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 14, 2015 at 2:06 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
>> Thanks for taking a look at it. That's somewhat cleaned up in the
>> attached patchseries - V2.2. This has been rebased to repair the minor
>> bit-rot pointed out by Thom.
>
> I don't really have the energy to review this patch in one batch, so I'd
> really like to split this into two:

I think we're all feeling worn out at this point, by this patch and by
others. I do appreciate your making the effort.

> 1. Solve the existing "problem" with exclusion constraints that if two
> insertions happen concurrently, one of them might be aborted with "deadlock
> detected" error, rather then "conflicting key violation" error. That really
> wouldn't be worth fixing otherwise, but it happens to be a useful stepping
> stone for this upsert patch.
>
> 2. All the rest.

I think we need a more pragmatic approach to dealing with the
exclusion constraint problems.

> I took a stab at extracting the parts needed to do 1. See attached. I didn't
> modify ExecUpdate to use speculative insertions, so the issue remains for
> UPDATEs, but that's easy to add.

Cool.

> I did not solve the potential for livelocks (discussed at
> http://www.postgresql.org/message-id/CAM3SWZTfTt_fehet3tU3YKCpCYPYnNaUqUZ3Q+NAASnH-60teA@mail.gmail.com).
> The patch always super-deletes the already-inserted tuple on conflict, and
> then waits for the other inserter. That would be nice to fix, using one of
> the ideas from that thread, or something else.

How about we don't super-delete at all with exclusion constraints? I'd
be willing to accept unprincipled deadlocks for exclusion constraints,
because they already exist today for UPSERT/NOSERT type use cases, and
with idiomatic usage seem far less likely for the IGNORE variant
(especially with exclusion constraints). I can see people using ON
CONFLICT UPDATE where they'd almost or actually be better off using a
plain UPDATE - that's quite a different situation. I find livelocks to
be a *very* scary prospect, and I don't think the remediations that
were mentioned are going to fly. It's just too messy, and too much of
a niche use case. TBH I am skeptical of the idea that we can fix
exclusion constraints properly in this way at all, at least not
without the exponential backoff thing, which just seems horrible.

> We never really debated the options for how to do the speculative insertion
> and super-deletion. This patch is still based on the quick & dirty demo
> patches I posted about a year ago, in response to issues you found with
> earlier versions. That might be OK - maybe I really hit the nail on
> designing those things and got them right on first try - but more likely
> there are better alternatives.

Intuitively, it seem likely that you're right here. However, it was
necessary to work through the approach to see what the problems are.
For example, the need for modifications to tqual.c became apparent
only through putting a full implementation of ON CONFLICT UPDATE
through various tests. In general, I've emphasized that the problems
with any given value locking implementation are non-obvious. Anyone
who thinks that he can see all the problems with his approach to value
locking without having a real implementation that is tested for
problems like unprincipled deadlocks is probably wrong.

That's sort of where I'm coming from with suggesting we allow
unprincipled deadlocks with exclusion constraints. I'm not confident
that we can find a perfect solution, and know that it's a perfect
solution. It's too odd, and too niche a requirement. Although you
understood what I was on about when I first talked about unprincipled
deadlocks, I think that acceptance of that idea came much later from
other people, because it's damn complicated. It's not worth it to add
some weird "Dining philosophers" exponential backoff thing to make
sure that the IGNORE variant when used with exclusion constraints can
never deadlock in an unprincipled way, but if it is worth it then it
seems unreasonable to suppose that this patch needs to solve that
pre-existing problem. No?

If we do something like an exponential backoff, which I think might
work, I fear that that kind of yak-shaving will leave us with
something impossible to justify committing; a total mess. Better the
devil you know (possible unprincipled deadlocks with the IGNORE
variant + exclusion constraints).

> Are we happy with acquiring the SpeculativeInsertLock heavy-weight lock for
> every insertion? That seems bad for performance reasons. Also, are we happy
> with adding the new fields to the proc array? Another alternative that was
> discussed was storing the speculative insertion token on the heap tuple
> itself. (See
> http://www.postgresql.org/message-id/52D00D2D.6030307@vmware.com)

Whatever works, really. I can't say that the performance implications
of acquiring that hwlock are at the forefront of my mind. I never
found that to be a big problem on an 8 core box, relative to vanilla
INSERTs, FWIW - lock contention is not normal, and may be where any
heavweight lock costs would really be encountered. Besides, the update
path won't have to do this at all.

I don't see how storing the promise token in heap tuples buys us not
having to involve heavyweight locking of tokens. (I think you may have
made a thinko in suggesting otherwise)

> Are we happy with the way super-deletion works? Currently, the xmin field is
> set to invalid to mark the tuple as super-deleted. That required checks in
> HeapTupleSatisfies* functions. One alternative would be to set xmax equal to
> xmin, and teach the code currently calls XactLockTableWait() to check if
> xmax=xmin, and not consider the tuple as conflicting.

That couldn't work without further HeapTupleSatisfiesDirty() logic.
Besides, why should one transaction be entitled to insert a
conflicting value tuple just because a still running transaction
deleted it (having also inserted the tuple itself)? Didn't one
prototype version of value locking #2 have that as a bug [1]? In fact,
originally, didn't the "xmin set to invalid" thing come immediately
from realizing that that wasn't workable?

I too think the tqual.c changes are ugly. I can't see a way around
using a token lock, though - I would only consider (and only consider)
refining the interface by which a waiter becomes aware that it must
wait on the outcome of the inserting xact's speculative
insertion/value lock (and in particular, that is should not wait on
xact outcome). We clearly need something to wait on that isn't an
XID...heavyweight locking of a token value is the obvious way of doing
that.

(thinks about it some more for a while)

It seems like what you're really taking issue with - the real issue -
is that we're introducing this whole new idea of making a tuple
visible for a moment, a moment that is potentially only a slice of its
originating transaction's total duration. Setting xmin to
invalidTransactionId is one way to do that, and may,
counterintuitively, even be the best way, but the fact that we're
playing new games with visibility is the real point of concern. We
could do something like store the token in the heap tuple, allowing us
to make fewer additions to PGPROC, but that seems kind of pointless
(and a waste of disk space). Playing new games with visibility is the
nature of the beast.

We keep talking about mechanism, but what if we *did* have the
infomask bit space to represent that xmin=xmax is a broken promise
tuple (and not a deleted, self-inserted, combocid-requiring tuple that
may or may not have been a promise tuple at some point in time)? I
think that a minor aesthetic improvement is the best we can hope for,
and maybe even that is too much to expect. Maybe we should just own
the fact that we're playing a new sort of game with visibility, and
keep things as they are. You might consider that setting xmin to
invalidTransactionId is more "honest" than any alternative scheme that
happens to avoid changes to HeapTupleSatisfiesMVCC() and so on.

Jim Nasby said something about setting the HEAP_XMIN_INVALID hint bit.
Maybe he is right...if that can be made to be reliable (always
WAL-logged), it could be marginally better than setting xmin to
invalidTransactionId. But only marginally; it seems like your
discomfort is basically inherent to playing these new games with
visibility, which is inherent to this design. As I said, I am having a
hard time seeing a way to do anything more than polish what we have
here. That's probably fine, though...it hasn't proven to be too
problematic (exclusion constraints aside).

Not having to change HeapTupleSatisfiesMVCC() and so on (to check if
xmin = InvalidTransactionId) is not obviously a useful goal here,
since ISTM that any alternative would have to *logically* do the same
thing. If I'm off the mark about your thinking here, please correct
me....are you just worried about extra cycles in the
HeapTupleSatisfies* routines?

[1] http://www.postgresql.org/message-id/52D5C74E.6090608@vmware.com
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2015-02-16 00:50:21 Re: Issue installing doc tools on OSX
Previous Message Andres Freund 2015-02-16 00:24:21 Re: Replication identifiers, take 4