Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT
Date: 2022-06-30 22:07:11
Message-ID: CAKFQuwZCb3SuuwzLJ7nsoYOUmXJrjvhAZ76vswn-FeX78L_oGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 30, 2022 at 2:31 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Thu, Jun 30, 2022 at 2:07 PM David G. Johnston
> <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> > Current:
> > "The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the
> > existing row using the table's name (or an alias), and to [rows] proposed
> > for insertion using the special excluded table."
> >
> > The word table in that sentence is wrong and not a useful way to think
> of the thing which we've named excluded. It is a single value of a
> composite type having the structure of the named table.
>
> I think that your reasoning is correct, but I don't agree with your
> conclusion. The term "special excluded table" is a fudge, but that
> isn't necessarily a bad thing. Sure, we could add something about the
> UPDATE being similar to an UPDATE with a self-join, as I said
> upthread. But I think that that would make the concept harder to
> grasp.
>

I don't think incorporating self-joining to be helpful; the status quo is
better than that. I believe people mostly think of "composite variable"
from the current description even if we don't use those words - or such a
concept can be explained by analogy with NEW and OLD (I think of it like a
trigger, only that SQL doesn't have variables so we cannot use that term,
hence just using "name").

>
> > I'll agree that most people will mentally paper over the difference and
> go merrily on their way. At least one person recently did not do that,
> which prompted an email to the community
>
> Can you provide a reference for this? Didn't see anything like that in
> the reference you gave upthread.
>

OK, the discussion I am recalling happened on Discord hence the lack of a
link.

On roughly 3/8 the following conversation occurred (I've trimmed out some
non-relevant comments):

>>>OP
Hello, I have a simple question.
My table has a column 'transaction_date'
I have an insert statement with an ON CONFLICT statement
I update using the 'excluded' values, but I only want to update if the
transaction date is the same or newer.
Do I just use: "WHERE EXCLUDED.transaction_date >= transaction_date"?
so the full query is something like: INSERT INTO table VALUES (pk, yadda1,
yadda2) ON CONFLICT (pk) DO UPDATE SET (yadda1 = EXCLUDED.yadda1, yadda2 =
EXCLUDED.yadda2) WHERE EXCLUDED.transaction_date >= transaction_date;

>>>Other Person
I mean, the ... like 3 examples imply what it contains, and it vaguely says
"and to rows proposed for insertion using the special excluded table."
but...
Still, based on the BNF, that should work as you stated it.

>>>OP
would perhaps it try to overwrite more than one row because many rows would
meet the criteria?
It seems like it limits it to the conflict row but..

>>>Other Person
Well, you're only conflicting on the PK, which is guaranteed to be unique.

>>>OP
Ah, so then it is limited to that row if it is specified within the ON
CONFLICT action if I am reading correct.
[...]
If it matters to you, the only thing I got wrong apparently (in my limited
non-sufficient testing) is that to access the current value within the
table row you must use the table name. So: WHERE EXCLUDED.transaction_date
>= tableName.transaction_date

>>>ME
"have access [...] to rows proposed for insertion using the special
excluded table.". You have an update situation where two tables (the
target and "excluded") are in scope with the exact same column names (by
definition) so any column references in the value expressions need to be
prefixed with which of the two tables you want to examine. As with a
normal UPDATE, the left side of the SET clause entry must reference the
target table and so its column cannot, and must not, be table qualified.
While it speaks of "rows" this is basically a per-row thing. As each row
is tested and finds a conflict the update is executed.

>>>Other Person
Mentioning something as critical as that offhand is a mistake IMO. It
should have its own section.
It's also not mentioned in the BNF, though it shows up in the examples. You
have to basically infer everything.

>>>ME
The exact wording of the conflict_action description in head is: "The SET
and WHERE clauses in ON CONFLICT DO UPDATE have access to the existing row
using the table's name (or an alias), and to rows proposed for insertion
using the special excluded table." I haven't read anything here that gives
me a hint as to how that ended up misinterpreted so that I could possibly
formulate an alternative wording. And I cannot think of a more appropriate
place to locate that sentence either. The examples do cover this and the
specifics here are not something that we try to represent in BNF.
I'd probably change "and to rows proposed for insertion" to "and to the
corresponding row proposed for insertion".

>>>OP
This does not change the original conclusion we arrived at correct? If I am
reading what you are saying right, since it only discovered the conflict
after examining the row, then by the same token it will only affect the
same row where the conflict was detected.

> I have a hard time imagining a user that reads the INSERT docs and
> imagines that "excluded" is a relation that is visible to the query in
> ways that are not limited to expression evaluation for the UPDATE's
> WHERE/SET.

Yes, and based on a single encounter I agree this doesn't seem like a
broadly encountered issue. My takeaway from that eventually led to this
proposal. The "Other Person" who is complaining about the docs is one of
the mentors on the Discord server and works for one of the corporate
contributors to the community. (I suppose Discord is considered public so
maybe this redaction is unnecessary...)

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-06-30 22:37:23 Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)
Previous Message Peter Geoghegan 2022-06-30 21:30:38 Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT