Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date: 2013-09-30 15:32:34
Message-ID: CA+TgmobwDZSVcKWTmVNBxeHSe4LCnW6zon2soH6L7VoO+7tAzw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 27, 2013 at 8:36 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Fri, Sep 27, 2013 at 5:36 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> I don't have another idea either. In fact, I'd go so far as to say
>>> that doing any third thing that's better than those two to any
>>> reasonable person is obviously impossible. But I'd add that we simple
>>> cannot rollback at read committed, so we're just going to have to hold
>>> our collective noses and do strange things with visibility.
>>
>> I don't accept that as a general principal. We're writing the code;
>> we can make it behave any way we think best.
>
> I presume you're referring to the principle that we cannot throw
> serialization failures at read committed. I'd suggest that letting
> that happen would upset a lot of people, because it's so totally
> unprecedented. A large segment of our user base would just consider
> that to be Postgres randomly throwing errors, and would be totally
> dismissive of the need to do so, and not without some justification -
> no one else does the same. The reality is that the majority of our
> users don't even know what an isolation level is. I'm not just talking
> about people that use Postgres more casually, such as Heroku
> customers. I've personally talked to people who didn't even know what
> a transaction isolation level was, that were in a position where they
> really, really should have known.

Yes, it might not be a good idea. But I'm just saying, we get to decide.

>> I doubt that any change to HeapTupleSatisfiesMVCC() will be
>> acceptable. This feature needs to restrain itself to behavior changes
>> that only affect users of this feature, I think.
>
> I agree with the principle of what you're saying, but I'm not aware
> that those changes to HeapTupleSatisfiesMVCC() imply any behavioral
> changes for those not using the feature. Certainly, the standard
> regression tests and isolation tests still pass, for what it's worth.
> Having said that, I have not thought about it enough to be willing to
> actually defend that bit of code. Though I must admit that I am a
> little encouraged by the fact that it passes casual inspection.

Well, at a minimum, it's a performance worry. Those functions are
*hot*. Single branches do matter there.

> I am starting to wonder if it's really necessary to have a "blessed"
> update that can see the locked, not-otherwise-visible tuple. Doing
> that certainly has its disadvantages, both in terms of code complexity
> and in terms of being arbitrarily restrictive. We're going to have to
> allow the user to see the locked row after it's updated (the new row
> version that we create will naturally be visible to its creating xact)
> - is it really any worse that the user can see it before an update (or
> a delete)? The user could decide to effectively make the update change
> nothing, and see the same thing anyway.

If we're not going to just error out over the invisible tuple the user
needs some way to interact with it. The details are negotiable.

> I get why you're averse to doing odd things to visibility - I was too.
> I just don't see that we have a choice if we want this feature to work
> acceptably with read committed. In addition, as it happens I just
> don't see that the general situation is made any worse by the fact
> that the user might be able to see the row before an update/delete.
> Isn't is also weird to update or delete something you cannot see?
>
> Couldn't EvalPlanQual() be said to be an MVCC violation on similar
> grounds? It also "reaches into the future". Locking a row isn't really
> that distinct from updating it in terms of the code footprint, but
> also from a logical perspective.

Yes, EvalPlanQual() is definitely an MVCC violation.

>>> Realize you can generally only kill the heap tuple *before* you have
>>> the row lock, because otherwise a totally innocent non-HOT update (may
>>> not update any unique indexed columns at all) will deadlock with your
>>> session, which I don't think is defensible, and will probably happen
>>> often if allowed to (after all, this is upsert - users are going to
>>> want to update their locked rows!).
>>
>> I must be obtuse; I don't see why that would deadlock.
>
> If you don't see it, then you aren't being obtuse in asking for
> clarification. It's really easy to be wrong about this kind of thing.
>
> If the non-HOT update updates some random row, changing the key
> columns, it will lock that random row version. It will then proceed
> with "value locking" (i.e. inserting index tuples in the usual way, in
> this case with entirely new values). It might then block on one of the
> index tuples we, the upserter, have already inserted (these are our
> "value locks" under your scheme). Meanwhile, we (the upserter) might
> have *already* concluded that the *old* heap row that the regular
> updater is in the process of rendering invisible is to blame in
> respect of some other value in some later unique index, and that *it*
> must be locked. Deadlock. This seems very possible if the key values
> are somewhat correlated, which is probably generally quite common.

OK, I see.

> The important observation here is that an updater, in effect, locks
> both the old and new sets of values (for non-HOT updates). And as I've
> already noted, any practical "value locking" implementation isn't
> going to be able to prevent the update from immediately locking the
> old, because that doesn't touch an index. Hence, there is an
> irresolvable disconnect between value and row locking.

This part I don't follow. "locking the old"? What irresolvable
disconnect? I mean, they're different things; I get *that*.

> Are we comfortable with this? Before you answer, consider that there
> was lots of bugs (their words) in the MySQL implementation of this
> same basic idea surrounding excessive deadlocking - I heard through
> the grapevine that they fixed a number of bugs along these lines, and
> that their implementation has historically had lots of deadlocking
> problems.
>
> I think that the way to deal with weird, unprincipled deadlocking is
> to simply not hold value locks at the same time as row locks - it is
> my contention that the lock starvation hazards that avoiding being
> smarter about this may present aren't actually an issue, unless you
> have some kind of highly implausible perfect storm of read-committed
> aborters inserting around the same values - only one of those needs to
> commit to remedy the situation - the first "no" answer is all we need
> to give up.

OK, I take your point, I think. The existing system already acquires
value locks when a tuple lock is held, during an UPDATE, and we can't
change that.

>>> Contrast this with my design, where re-ordering of would-be
>>> conflicters across unique indexes (or serialization failures) can
>>> totally nip this in the bud *if* the contention can be re-ordered
>>> around, but if not, at least there is no need to worry about
>>> aggregating bloat at all, because it creates no bloat.
>>
>> Yeah, possibly.
>
> I think that re-ordering is an important property of any design where
> we cannot bail out with serialization failures. I know it seems weird,
> because it seems like an MVCC violation to have our behavior altered
> as a result of a transaction that committed that isn't even visible to
> us. As I think you appreciate, on a certain level that's just the
> nature of the beast. This might sound stupid, but: you can say the
> same thing about unique constraint violations! I do not believe that
> this introduces any anomalies that read committed doesn't already
> permit according to the standard.

I worry about the behavior being confusing and hard to understand in
the presence of multiple unique indexes and reordering. Perhaps I
simply don't understand the problem domain well-enough yet.

From a user perspective, I would really think people would want to
specify a set of key columns and then update if a match is found on
those key columns. Suppose there's a unique index on (a, b) and
another on (c), and the user passes in (a,b,c)=(1,1,1). It's hard for
me to imagine that the user will be happy to update either (1,1,2) or
(2,2,1), whichever exists. In what situation would that be the
desired behavior?

Also, under such a programming model, if somebody drops a unique index
or adds a new one, the behavior of someone's application can
completely change. I have a hard time swallowing that. It's an
established precedent that dropping a unique index can make some other
operation fail (e.g. ADD FOREIGN KEY, and more recently CREATE VIEW ..
GROUP BY), and of course it can cause performance or plan changes.
But overturning the semantics is, I think, something new, and it
doesn't feel like a good direction.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2013-09-30 15:35:26 Re: review: psql and pset without any arguments
Previous Message Nicholas White 2013-09-30 15:22:42 Re: Re: Request for Patch Feedback: Lag & Lead Window Functions Can Ignore Nulls