Re: UPSERT wiki page, and SQL MERGE syntax

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: UPSERT wiki page, and SQL MERGE syntax
Date: 2014-10-10 20:41:30
Message-ID: CAM3SWZRfFTzcK-gptOFfNWgt34=Bo136ciuaFnLFU-vkFuwxAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 10, 2014 at 12:09 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think what's realistic here is that the patch isn't going to get
> committed the way you have it today, because nobody else likes that
> design. That may be harsh, but I think it's accurate.

I do think that's harsh, because it's unnecessary: I am looking for
the best design. If you want to propose alternatives, great, but there
is a reason why I've done things that way, and that should be
acknowledged. I too think naming the unique index is ugly as sin, and
have said as much multiple times. We're almost on the same page here.

> But on the substance of the issue, I'm totally unconvinced by your
> argument in the linked email. Let's suppose you have this:
>
> create table foo (a int, b int);
> create unique index foo1 on foo (a);
> create unique index foo2 on foo (a);
> create unique index foo3 on foo (a) where b = 1;
>
> Anything that conflicts in foo1 or foo2 is going to conflict in the
> other one, and with foo3. Anything that conflicts in foo3 is perhaps
> also going to conflict in foo1 and foo2, or perhaps not. Yet, if the
> statement simply says ON DUPLICATE (a) UPDATE, it's entirely clear
> what to do: when we hit a conflict in any of those three indexes,
> update the row. No matter which index has the conflict, updating is
> the right answer, and solves the conflict in all three indexes. I
> dunno what you'd expect someone to write in your syntax to solve this
> problem - presumably either (1) they can list any of the indexes that
> might conflict, (2) they must list all of the indexes that might
> conflict, or (3) it just doesn't work.

I expect them to name exactly one unique index. They should either do
that explicitly, or have one in mind and make the behavior implicit at
the risk of miscalculating (and having a surprising outcome). It
doesn't matter if it's foo1 or foo2 in this example (but foo3 is
different, obviously).

> Whatever the actual behavior
> of your patch is today, it seems to me that the conflict is,
> fundamentally, over a set of column values, NOT over a particular
> index. The index is simply a mechanism for noticing that conflict.

I think that this is the kernel of our disagreement. The index is not
simply a mechanism for noticing that conflict. The user had better
have one unique index in mind to provoke taking the alternative path
in the event of a would-be unique violation. Clearly it doesn't matter
much in this particular example. But what if there were two partial
unique indexes, that were actually distinct, but only in terms of the
constant appearing in their predicate? And what about having that
changed by a before insert row-level trigger? Are we to defer deciding
which unique index to use at the last possible minute?

As always with this stuff, the devil is in the details. If we work out
the details, then I can come up with something that's acceptable to
everyone. Would you be okay with this never working with partial
unique indexes? That gives me something to work with.

> If you want to allow this to work with expression indexes, that's not
> really a problem; you can let the user write INSERT .. ON CONFLICT
> (upper(foo)) UPDATE ... and match that to the index. It wouldn't
> bother me if the first version of this feature couldn't target
> expression indexes anyway, but if you want to include that, having the
> user mention the actual expression rather than the index name
> shouldn't make much difference.

I'm not that worried about expression indexes, actually. I'm mostly
worried about partial unique indexes, particularly when before insert
row-level triggers are in play (making *that* play nice with
expression indexes is harder still, but expression indexes on their
own are probably not that much of a problem).

>>> Also, how about making the SET clause optional, with the semantics
>>> that we just update all of the fields for which a value is explicitly
>>> specified:
>>>
>>> INSERT INTO overwrite_with_abandon (key, value)
>>> VALUES (42, 'meaning of life')
>>> ON DUPLICATE (key) UPDATE;

> Your syntax allows the exact same thing; it simply require the user to
> be more verbose in order to get that behavior. If we think that
> people wanting that behavior will be rare, then it's fine to require
> them to spell it out when they want it. If we think it will be the
> overwhelming common application of this feature, and I do, then making
> people spell it out when we could just as well infer it is pointless.

Did you consider my example? I think that people will like this idea,
too - that clearly isn't the only consideration, though. As you say,
it would be very easy to implement this. However, IMV, we shouldn't,
because it is hazardous. MySQL doesn't allow this, and they tend to
find expedients like this useful.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-10-10 20:49:31 Re: Column Redaction
Previous Message Peter Geoghegan 2014-10-10 19:37:13 Re: UPSERT wiki page, and SQL MERGE syntax