Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
Date: 2015-03-17 19:11:58
Message-ID: 55087C7E.3070004@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/05/2015 03:18 AM, Peter Geoghegan wrote:
> Attached patch series forms what I'm calling V3.0 of the INSERT ... ON
> CONFLICT IGNORE/UPDATE feature. (Sorry about all the threads. I feel
> this development justifies a new thread, though.)

I'm still not sure the way the speculative locking works is the best
approach. Instead of clearing xmin on super-deletion, a new flag on the
heap tuple seems more straightforward. And we could put the speculative
insertion token in t_ctid, instead of stashing it in the PGPROC array.
That would again seem more straightforward.

A couple of quick random comments:

> /*
> * plan_speculative_use_index
> * Use the planner to decide speculative insertion arbiter index
> *
> * Among indexes on target of INSERT ... ON CONFLICT, decide which index to
> * use to arbitrate taking alternative path. This should be called
> * infrequently in practice, because it's unusual for more than one index to
> * be available that can satisfy a user-specified unique index inference
> * specification.
> *
> * Note: caller had better already hold some type of lock on the table.
> */
> Oid
> plan_speculative_use_index(PlannerInfo *root, List *indexList)
> {
> ...
> /* Locate cheapest IndexOptInfo for the target index */

If I'm reading this correctly, if there are multiple indexes that match
the unique index inference specification, we pick the cheapest one.
Isn't that unstable? Two backends running the same INSERT ON CONFLICT
statement might pick different indexes, and the decision might change
over time as the table is analyzed. I think we should have a more robust
rule. Could we easily just use all matching indexes?

> ... Deferred unique constraints are not supported as
> + arbiters of whether an alternative <literal>ON CONFLICT</> path
> + should be taken.

We really need to find a shorter term for "arbiter of whether an
alternative path should be taken". Different variations of that term are
used a lot, and it's tedious to read.

> * There is still an unresolved semantics issue with unique index
> inference and non-default opclasses. I think it's sufficient that the
> available/defined unique indexes dictate our idea of a unique
> violation (that necessitates taking the alternative path). Even in a
> world where there exists a non-default opclass with an "equals"
> operator that does not match that of the default opclass (that is not
> really the world we live in, because the only counter-example known is
> made that way specifically to *discourage* its use by users), this
> seems okay to me. It seems okay to me because surely the relevant
> definition of equality is the one actually chosen for the available
> unique index. If there exists an ambiguity for some strange reason
> (i.e. there are two unique indexes, on the same attribute(s), but with
> different "equals" operators), then its a costing issue, so the
> behavior given is essentially non-predictable (it could end up being
> either...but hey, those are the semantics). I have a very hard time
> imagining how that could ever be the case, even when we have (say)
> case insensitive opclasses for the text type. A case insensitive
> opclass is stricter than a case sensitive opclass. Why would a user
> ever want both on the same attribute(s) of the same table? Is the user
> really more or less expecting to never get a unique violation on the
> non-arbitrating unique index, despite all this?
>
> If reviewers are absolutely insistent that this theoretical ambiguity
> is a problem, we can add an optional CREATE INDEX style opclass
> specification (I'm already using the IndexElems representation from
> CREATE INDEX for the inference specification, actually, so that would
> be easy enough). I really have a hard time believing that the ugliness
> is a problem for those hypothetical users that eventually consider
> "equals" operator ambiguity among opclasses among available unique
> indexes to be a problem. I haven't just gone and implemented this
> already because I didn't want to document that an opclass
> specification will be accepted. Since there is literally no reason why
> anyone would care today, I see no reason to add what IMV would really
> just be cruft.

I've been thinking that it would be nice to be able to specify a
constraint name. Naming an index directly feels wrong, as in relational
and SQL philosophy, indexes are just an implementation detail, but
naming a constraint is a fair game. It would also be nice to be able to
specify "use the primary key".

Attached patch contains a few more things I saw at a quick read.

- Heikki

Attachment Content-Type Size
misc-upsert-issues.patch application/x-patch 4.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-03-17 19:22:33 Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
Previous Message Greg Stark 2015-03-17 19:11:28 Re: Bug in point releases 9.3.6 and 9.2.10?