Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: hlinnaka(at)iki(dot)fi
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
Date: 2015-03-18 19:40:23
Message-ID: CAM3SWZTCdim7AOK5Hm=rqJLP7i1cKjBwYvWSz8BaacwR8MOkYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 18, 2015 at 11:41 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> AFAICS, there is no need to go and clear the tag after the insert has
> completed.
>
> Here's what I had in mind: the inserter tags the tuple with the speculative
> insertion token, by storing the token in the t_ctid field. If the inserter
> needs to super-delete the tuple, it sets xmax like in a regular deletion,
> but also sets another flag to indicate that it was a super-deletion.
>
> When another backend inserts, and notices that it has a potential conflict
> with the first tuple, it tries to acquire a hw-lock on the token. In most
> cases, the inserter has long since completed the insertion, and the
> acquisition succeeds immediately but you have to check because the token is
> not cleared on a completed insertion. It would be slightly faster for the
> second backend if the inserter actually removed the token after the
> insertion has completed: it wouldn't need to check the lock if the tuple was
> not tagged in the first place. But that'd be just a tiny optimization, for
> the rare case that there is a conflict, and surely not worth it.
>
> Hmm, I just realized a little fly in that ointment: if the inserter inserts
> enough tuples to wrap around the token counter (4 billion?) in a single
> transaction, another backend that inserts could confuse a new speculative
> insertion with one that completed a long time ago. The risk seems small
> enough that we could just accept it. I don't think anything particularly bad
> would happen on a false positive here. Or we could also use all 48 bits
> available in the t_ctid to push it truly in the realm of ain't-gonna-happen.
> Or we could use (relfilenode, block, token) as the lock tag for the
> SpeculativeInsertionLock.

Maybe we'd use all 48-bits anyway, but it's not a major concern either
way. Speculative token locks (by design) are only held for an instant.
I think a spurious wait would be entirely benign, and very unlikely.

> Regarding the physical layout: We can use a magic OffsetNumber value above
> MaxOffsetNumber to indicate that the t_ctid field stores a token rather than
> a regular ctid value. And another magic t_ctid value to indicate that a
> tuple has been super-deleted. The token and the super-deletion flag are
> quite ephemeral, they are not needed after the inserting transaction has
> completed, so it's nice to not consume the valuable infomask bits for these
> things. Those states are conveniently not possible on an updated tuple, when
> we would need the t_ctid field for it's current purpose.

You seem pretty convinced that this is the way to go. I'll try and
produce a revision that works this way shortly. I don't think that it
will be all that hard to come up with something to prove the idea out.

>> I think Heikki's concern is something different, although I am not
>> altogether up to speed on this and may be confused. The issue is:
>> suppose that process A and process B are both furiously upserting into
>> the same table with the same key column but, because they have
>> different costing parameters, process A consistently chooses index X
>> and process B consistently chooses index Y. In that situation, will
>> we deadlock, livelock, error out, bloat, or get any other undesirable
>> behavior, or is that A-OK?
>
> Right, that's what I had in mind.

Oh, I see. I totally failed to understand that that was the concern.

I think it'll be fine. The pre-check is going to look for a heap tuple
using one or the other of (say) a pair of equivalent indexes. We might
miss the heap tuple because we picked an index that had yet to have a
physical index tuple inserted, and then hit a conflict on optimistic
insertion (the second phase). But there is no reason to think that
that won't happen anyway. The ordering of operations isn't critical.

The one issue you might still have is a duplicate violation, because
you happened to infer the unique index that does not get to tolerate
unique violations as conflicts that can be recovered from (and there
was a race, which is unlikely). I don't really care about this,
though. You really are inferring one particular unique index, and for
reasons like this I think it would be a mistake to try to pretend that
the unique index is 100% an implementation detail. That's why I called
the new clause a unique index inference specification.

This hypothetical set of unique indexes will always have n-1 redundant
unique indexes - they must closely match. That's something that calls
into question why the user wants things this way to begin with. Also,
note that one unique index will consistently "win", since the
insertion order is stable (the relcache puts them in OID order). So it
will not be all over the map.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2015-03-18 19:55:01 Re: pg_upgrade and rsync
Previous Message Alvaro Herrera 2015-03-18 19:03:51 Re: proposal: searching in array function - array_position