Re: Guidance on INSERT RETURNING order

From: John Howroyd <jdhowroyd(at)googlemail(dot)com>
To: Mike Bayer <mike_mp(at)zzzcomputing(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Guidance on INSERT RETURNING order
Date: 2023-04-13 12:26:44
Message-ID: CAAGaYBxFNMK0xS6w8uM_z2M93hg539GHnBqdgDaYEQrbpT7inQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

A few more thoughts.

My " simply marshal the RETURNING results through and in memory/temporary
table maintaining the order from the SELECT and return these to the client
upon commit" isn't quite right with the "upon commit": that (the commit)
may happen much later with several statements wrapped into a transaction.
So very much an internal decision about when those results happen to be
returned. The sentiment is that the returning results are emitted as a
last step in the processing of the statement / execution plan. Future
statements in a multi-statement transaction may depend on the output.

However, the marshalling is presumably already happening (except any tie to
ordering to the original declared tuples); otherwise I really don't
understand how the with syntax (I think first suggested by Adrian Klaver)
would work. @SQLAlchemy, if you wanted to off load reordering to database
side a combination solution might be considered:

with rslt as (
INSERT INTO mytable (a, b)
SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num)
ORDER BY num
RETURNING mytable.id, mytable.other_auto_gen
)
select * from rslt
order by id
;

should work (not yet tested); and could be modified to the "sentinel"
columns. Where the primary key is explicitly in the inserted columns (and
you are not using an identity) then explicit matching would need to occur
unless you are given an order guarantee. As for randomly generated
GUID/UUID, I don't think anything will work other than a guarantee on order
or a surrogate "sentinel" column/explicit tuple id.

@Mike Bayer: Isn't the order guarantee sufficient (wasn't that the original
assumption in the 2.0 code)? I do quite understand the wish for having one
solution that fits all without dialectic dependencies. However, in my
opinion, this one is going to run for some time and is heavily dependent on
internals. With mssql, this seems to me to be deep down in the insert
internals and unlikely to change any time soon (at least until they start
losing market share because other DBMSs do it better). Here (PostgreSQL)
the actual mechanisms required can probably be achieved much more readily
(given the will). But the fundamental issue of matching code side objects
with database records (with side effects) remains and is only going to
become more prominent. @PostgreSQL, isn't this your opportunity to lead
the way again!-).

On Thu, 13 Apr 2023 at 05:26, Mike Bayer <mike_mp(at)zzzcomputing(dot)com> wrote:

> here's what would be cool, a new token called "tuple_order" or something
>
>
> INSERT INTO table (a, b, c) VALUES ((1, 2, 3), (4, 5, 6), ...) RETURNING
> table.id, inserted.tuple_order
>
> tuple_order would be incrementing values 1, 2, 3, 4, 5, ... which
> correlate the each row delivered by RETURNING to each entry in the VALUES
> clause, in the order they were stated in that VALUES clause, that is entry
> (1, 2, 3) would be tuple_order 1, entry (4, 5, 6) would be tuple order 2,
> etc.
>
> That token would solve the entire problem in all cases. The 1800 line
> changeset I've been working on all week would go away (if every database
> had this) and the change would be reduced to a few dozen lines.
>
>
> On Wed, Apr 12, 2023, at 11:05 PM, John Howroyd wrote:
>
> To clarify, the work around (for SQLAlchemy) does not require the
> RETURNING values to be ordered; they will be reordered code side. Longer
> term matching the order up would be an added bonus!
>
> On Thu, 13 Apr 2023 at 04:02, John Howroyd <jdhowroyd(at)googlemail(dot)com>
> wrote:
>
> The ideal solution would be that the RETURNING values are in the same
> order as the declared rows in the table literal "FROM (VALUES (10, 11, 1),
> (12, 13, 2)) AS sel(p1, p2, num)". But before any of you jump down my
> throat about about sets and a bar to parallel inserts, I think it's a small
> matter of some internal smoke and mirrors: simply marshal the RETURNING
> results through and in memory/temporary table maintaining the order from
> the SELECT and return these to the client upon commit. At this moment, I
> believe the folk at SQLAlchemy are looking for a work around that can work
> for most use cases with either an identity column or a fake "sentinel"
> column for matching back to the ORM objects. There may be an upper limit
> of 1000 to the number rows in a single insert (this arises in MS SQL
> Server, who absolutely do not guarantee the order of their OUTPUT clause --
> it's tied to the physical row insert rather than marshalled).
>
> My feeling is that PostgreSQL may already do this. See the commit:
>
> https://github.com/postgres/postgres/commit/9118d03a8cca3d97327c56bf89a72e328e454e63
> and the original message thread
>
> https://www.postgresql.org/message-id/flat/CADBwkHv1nUx5bqxqRLp26k8iUWSnY3uW-1_0EMYxKns0t2JnUA%40mail.gmail.com
> and references to undesired behaviour prior to PostgreSQL 9.6 such as in
> https://www.postgresql.org/docs/current/sql-select.html.
>
> The test cases in the above commit use:
> SELECT nextval('sequence'), datavals FROM ... ORDER BY something;
> type syntax. And I presume that's exactly what's happening internally
> when there's a serial identity column (via query rewrite rules or another
> mechanism).
>
> So really, that just needs confirming. What happens with triggers?
> Presumably the same as long as query rewrite rules are used, but this is
> presumably getting deeper into the code for actual inserts after the
> initial select. The jump to the output being ordered, is just a guess but
> there's some bits and pieces that seem to suggest that there may indeed be
> a marshalling process going on in the background (whether or not that is
> linked to the original order is another matter).
>
> I have set up a PostgreSQL server to test if I can break this hypothesis
> and see what query explains can allude to. Does anyone have a test case
> where the order is not preserved?
>
> Might I also point out that if the implementation of parallel inserts does
> create a bar then doing so may end up with the programmatic interfaces
> (such as SQLAlchemy) not being able to use that feature (possibly reverting
> back to single inserts). Ur, so what would be the problem being solved
> with parallel inserts?
>
> On Thu, 13 Apr 2023 at 02:27, Mike Bayer <mike_mp(at)zzzcomputing(dot)com> wrote:
>
> We do something different for uuids. These are usually created client
> side anyway or are very easy to create client side if not and then we also
> have a separate sentinel column option.
>
> Overall the whole thing is an optimization that can be turned off for
> uncommon cases. We just can't go back to having insert of 1000 rows be
> 3-5x slower for all tables w server generated primary keys which is what
> statement at a time gets us. Surrogate integer Pks w sequence or identity
> is the big case.
>
> On Wed, Apr 12, 2023, at 8:58 PM, Thorsten Glaser wrote:
> > On Wed, 12 Apr 2023, Mike Bayer wrote:
> >
> >>ascending values for "mytable_id_seq" will correspond to the ascending
> >>values for "num".
> >
> > But, again, what if it’s uuid or something instead of a sequence?
> >
> > bye,
> > //mirabilos
> > --
> > 15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld
> :-)
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Mike Bayer 2023-04-13 13:32:37 Re: Guidance on INSERT RETURNING order
Previous Message Amit Kapila 2023-04-13 09:31:43 Re: Support logical replication of DDLs