Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
Date: 2015-02-02 01:06:13
Message-ID: 20150202010613.GA25543@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

A first (not actually that quick :() look through the patches to see
what actually happened in the last months. I didn't keep up with the
thread.

Generally the split into the individual commits doesn't seem to make
much sense to me. The commits individually (except the first) aren't
indivdiually commitable and aren't even meaningful. Splitting off the
internal docs, tests and such actually just seems to make reviewing
harder because you miss context. Splitting it so that individual piece
are committable and reviewable makes sense, but... I have no problem
doing the user docs later. If you split of RLS support, you need to
throw an error before it's implemented.

0001:
* References INSERT with ON CONFLICT UPDATE, can thus not be committed
independently. I don't think those references really are needed.
* I'm not a fan of the increased code duplication in
ExecCheckRTEPerms(). Can you look into cleaning that up?
* Also the comments inside the ACL_INSERT case still reference UPDATE.

Other than that I think we can just go ahead and commit this ahead of
time. Mentioning ON CONFLICT UPDATE (OCU henceforth) in the commit
message only.

0007:
* "AMs" alone isn't particularly unique.
* Without the context of the discussion "unprincipled deadlocks" aren't
well defined.
* Too many "" words.
* Waiting "too long" isn't defined. Neither is why that'd imply
unprincipled deadlocks. Somewhat understandable with the context of
the discussion, but surely not a couple years down the road.
* As is I don't find the README entry super helpful. It should state
what the reason for doing this is cleary, start at the higher level,
and then move to the details.
* Misses details about the speculative heavyweight locking of tuples.

0002:
* Tentatively I'd say that killspeculative should be done via a separate
function instead of heap_delete()
* I think we should, as you ponder in a comment, do the OCU specific
stuff lazily and/or in a separate function from BuildIndexInfo(). That
function is already quite visible in profiles, and the additional work
isn't entirely trivial.
* I doubt logical decoding works with the patch as it stands.
* The added ereport (i.e. user facing) error message in
ExecInsertIndexTuples won't help a user at all.
* Personally I don't care one iota for comments like "Get information
from the result relation info structure.". Yes, one of these already
exists, but ...
* If a arbiter index is passed to ExecCheckIndexConstraints(), can't we
abort the loop after checking it? Also, do we really have to iterate
over indexes for that case? How about moving the loop contents to a
separate function and using that separately for the arbiter cases?
* Don't like the comment above check_exclusion_or_unique_constraint()'s
much. Makes too much of a special case of OCU
* ItemPointerIsValid
* ExecCheckHeapTupleVisible's comment "It is not acceptable to proceed "
sounds like you're talking with a child or so ;)
* ExecCheckHeapTupleVisible()'s errhint() sounds like an
argument/excuse (actually like a code comment). That's not going to
help a user at all.
* I find the modified control flow in ExecInsert() pretty darn ugly. I
think this needs to be cleaned up. The speculative case should imo be
a separate function or something.
* /*
* This may occur when an instantaneously invisible tuple is blamed
* as a conflict because multiple rows are inserted with the same
* constrained values.
How can this happen? We don't insert multiple rows with the same
command id?
* ExecLockUpdatedTuple() has (too?) many comments, but little overview
of what/why it is doing what it does on a higher level.

* plan_speculative_use_index: "Use the planner to decide speculative
insertion arbiter index" - Huh? " rel is the target to undergo ON
CONFLICT UPDATE/IGNORE." - Which rel?
* formulations as "fundamental nexus" are hard to understand imo.
* Perhaps it has previously been discussed but I'm not convinced by the
reasoning for not looking at opclasses in infer_unique_index(). This
seems like it'd prohibit ever having e.g. case insensitive opclasses -
something surely worthwile.
* Doesn't infer_unique_index() have to look for indisvalid? This isn't
going to work well with a invalid (not to speak for a !ready) index.
* Is ->relation in the UpdateStmt generated in transformInsertStmt ever
used for anything? If so, it'd possibly generate some possible
nastyness due to repeated name lookups. Looks like it'll be used in
transformUpdateStmt
* What's the reason for the !pstate->p_parent? Also why the parens?
pstate->p_is_speculative = (pstate->parentParseState &&
(!pstate->p_parent_cte &&
pstate->parentParseState->p_is_insert &&
pstate->parentParseState->p_is_speculative));
* Why did you need to make %nonassoc DISTINCT and ON nonassoc in the grammar?
* The whole speculative insert logic isn't really well documented. Why,
for example, do we actually need the token? And why are there no
issues with overflow? And where is it documented that a 0 means
there's no token? ...
* Isn't "SpecType" a awfully generic (and nondescriptive) name?
* /* XXX: Make sure that re-use of bits is safe here */ - no, not
unless you change existing checks.
* /*
* Immediately VACUUM "super-deleted" tuples
*/
if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
InvalidTransactionId))
return HEAPTUPLE_DEAD;
Is that branch really needed? Shouldn't it just be happening as a
consequence of the already existing code? Same in SatisfiesMVCC. If
you actually needed that block, it'd need to be done in SatisfiesSelf
as well, no? You have a comment about a possible loop - but that seems
wrong to me, implying that HEAP_XMIN_COMMITTED was set invalidly.

Ok, I can't focus at all any further at this point. But there's enough
comments here that some even might make sense ;)

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-02-02 01:39:31 Some dead code in metaphone() of fuzzystrmatch.c
Previous Message Noah Misch 2015-02-01 20:14:27 Re: Small doc patch about pg_service.conf