Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)
Date: 2015-02-02 17:36:19
Message-ID: 54CFB593.3020509@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/03/2015 10:42 PM, Peter Geoghegan wrote:
> On Sat, Jan 3, 2015 at 2:41 AM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> A-ha, I see. And this can happen without INSERT ON CONFLICT, too? In that
>> case, one of the transactions is bound to error and roll back anyway, but
>> you get a deadlock error instead of the constraint violation error, which is
>> not as nice.
>
> Agreed.
>
> I haven't experimentally verified that this can happen with exclusion
> constraints without the ON CONFLICT patch being involved, but I would
> be very surprised if it didn't. How could it possibly not happen? It's
> not all that bad since, as you say, one or the other xact was going to
> be aborted anyway. Since the insertions would have to occur at exactly
> the same time, even if one backend were to get an exclusion violation
> rather than being killed by the deadlock detector, the choice of which
> backend to raise an error within would be essentially random anyway.

Yep. I just tested this, and I can confirm that it does happen with
vanilla exclusion constraints. If two backends insert the same value at
the same time, you usually get the error "conflicting key value violates
exclusion constraint", but if the timing is right, you get "deadlock
detected" instead.

Let's focus on fixing that case first. I wouldn't care otherwise, but it
allows us to work on the locking, the "super-deletion" etc. without all
the rest of the patch. That will provide you a way to split the patch
into two: 1. Avoid deadlock errors with regular exclusion constraints,
with all the locking etc. that's needed to solve that, and 2. Upsert.

>> 1. On conflict, mark the inserted tuple as killed, and retry. But before
>> retrying, acquire a new kind of lock on the table, let's call it
>> SpeculativeInsertionLock. This fixes the deadlock, by retrying instead of
>> sleeping, and avoids the livelock because the new lock ensures that only one
>> backend retries at a time.
>
> We "super delete" the tuple on retry already. However, we wait for the
> other xact with our broken promise tuple still physically inserted
> into the heap. We can't super delete the tuple before
> XactLockTableWait()/SpeculativeInsertionWait() lock acquisition,
> because doing so risks livelock. I think you already agree with me up
> to here.
>
> However, if we're not sleep waiting on the other xact (rather, we're
> "retrying [with a new class of exclusive table-level lock] instead of
> sleeping"), why should our xact be able to do useful work on retry?
> Why won't it just spin/busy wait? More to the point, why wouldn't it
> deadlock when it is obliged to wait on a third inserter's tuple?
> AFAICT, you've just moved the problem around, because now we're
> obliged to get a shared lock on the xid or speculative token
> (XactLockTableWait()/SpeculativeInsertionWait()) of a session that
> itself wants this new table level lock that only we have.

Sorry, I was very terse in my previous explanation. Let me try again.
Let's begin with a simpler, poor-performance version of the scheme:

1. Acquire the new SpeculativeInsertionLock on the table
2. Insert tuple to heap and index.
3. Scan the index to see if there is any other conflicting tuple. (If
there isn't, or the conflicting tuple is already committed, we're done)
4. Super-delete the tuple we already inserted
5. Release SpeculativeInsertionLock
6. XactLockTableWait() on the other guy

Note that we don't try to acquire any other locks while holding
SpeculativeInsertionLock.

Compare this with the way unique-checks in b-tree work today. The B-tree
check is free of race conditions because we hold the lock on the b-tree
page while we check the visibility of the page, and we don't insert the
index tuple if we have to wait. The SpeculativeInsertionLock
accomplishes the same. It makes steps 3 and 4 atomic.

Compared to your patch as it stands, this fixes the deadlock because
when A inserts a tuple and scans the index, any conflicting tuples it
finds must have completed the insertion before A. The other inserters
won't later try to wait on the tuple that A inserts.

We could do the above without the new lock, but as you've said, that
would risk a livelock. Two concurrent inserters might repeatedly insert,
scan, find each other, super-delete, and retry and not make progress.
The lock fixes that by ensuring that there is only one inserter is doing
the above at a time.

(With the above procedure, we could also first scan the index for
conflicts, and only insert after that, because the
SpeculativeInsertionLock prevents anyone else from inserting a
conflicting row. But of course, we're not going to hold an exclusive
table-level lock under normal circumstances anyway; the above was just a
prelude to the real scheme below.)

The full procedure is this:

1. Insert tuple to heap and index
2. Scan the index to see if there is any other conflicting tuple. (If
there isn't, or the conflicting tuple is already committed, we're done)
3. Super-delete the tuple we already inserted

4. Acquire SpeculativeInsertionLock on the table
5. Insert tuple to heap and index
6. Scan the index to see if there is any other conflicting tuple. (If
there isn't, or the conflicting tuple is already committed, we're done)
7. Super-delete the tuple we already inserted
8. Release SpeculativeInsertionLock
9. XactLockTableWait() on the other guy

So in a nutshell, first try to do the insertion without the table-level
lock. But because that would be prone to livelocks, if it doesn't
immediately work, retry with the table-level lock.

>> 2. Use e.g. the XID (or backend id or something) to resolve the tie. When
>> you have inserted a tuple and find that it conflicts with another
>> in-progress insertion, check the conflicting tuple's xmin. If it's < current
>> XID, wajt for the other inserter to finish. Otherwise kill the
>> already-inserted tuple first, and wait only after that.
>
> That sounds scary. I think it might introduce livelock hazards. What
> about some third xact, that's waiting on our XID, when we ourselves
> super delete the already-inserted tuple in deference to the first
> xact/inserter?

He will get woken up when we super-delete the already-inserted tuple.
What problem do you see there?

I actually like this scheme the best. It's simple. I haven't made any
rigorous analysis to prove that it's correct, but I don't immediately
see a problem with it.

> I also worry about the bloating this could cause.

Well, this is all under the assumption that conflicts and
super-deletions are rare. Of course, we must avoid the situation where
we have to attempt an insertion dozens of times until it succeeds,
leaving behind 20x bloat compared to the real data. I don't think that
would happen with these schemes, although it will need some testing I guess.

>> Can there be other lock types involved in the deadlock? AFAICS it's always
>> going to be between two or more backends that wait on each with
>> XactLockTableWait (or some variant of that specific to speculative
>> insertions).
>
> I believe you're right about that. But don't forget that at least with
> the current implementation, we also get spurious exclusion violations.

Ok, I clearly haven't been able to keep up with all the issues here. How
do spurious exclusion violations arise?

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message José Luis Tallón 2015-02-02 19:09:59 Re: Fwd: [GENERAL] 4B row limit for CLOB tables
Previous Message Robert Haas 2015-02-02 16:48:35 Re: Odd behavior of updatable security barrier views on foreign tables