Re: Guidance on INSERT RETURNING order

From: "Mike Bayer" <mike_mp(at)zzzcomputing(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Guidance on INSERT RETURNING order
Date: 2023-04-12 23:38:38
Message-ID: be108555-da2a-4abc-a46b-acbe8b55bd25@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Apr 12, 2023, at 6:24 PM, Thorsten Glaser wrote:
> On Wed, 12 Apr 2023, Kirk Wolak wrote:
>
> >The solution exists. Pre-fetch the IDs, assign them and insert them with
> >the IDs. Then you have 100% control.
> >SELECT NEXTVAL('tbl_seq') from GENERATE_SERIES(1, <total_needed>);
>
> This would be a nice solution… but the PK could be always generated,
> or not even sequential (UUIDs for example). If you’re developing
> schema-first the application would not even (need to) know about how
> the IDs are generated as it’s the DB that generates them.
>
> AIUI the OP’s an SQLAlchemy developer, which is an ORM. Therefore he
> needs a generic solution.

im my view I already have the solution which is the form I referred to in my earlier email:

INSERT INTO mytable (a, b)
SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) ORDER BY num
RETURNING mytable.id

the columns we want to INSERT, a and b, come from p1 and p2 in the SELECT. the third value in each table-row value inside the VALUES clause we call the "sentinel" and we apply it as an incrementing number which we also ORDER BY.

per Tom Lane's email at https://www.postgresql.org/message-id/29386.1528813619%40sss.pgh.pa.us in 2018, assuming mytable.id is serial (or I would assume IDENTITY), this resolves to something equivalent to:

INSERT INTO mytable (id, a, b)
SELECT nextval("mytable_id_seq"), p1, p2 FROM (
SELECT p1, p2 FROM (VALUES (10, 11, 1), (12, 13, 2)) AS sel(p1, p2, num) ORDER BY num
) AS ss RETURNING mytable.id

So the above is one part we want to confirm is accurate, hasn't changed, is not going to change much as far as its ultimate behavior. It means that as rows are generated from the SELECT, the ascending values for "mytable_id_seq" will correspond to the ascending values for "num". That's the important part; we are sending in an number that controls the sorting of records as they are delivered outwards, and the generated identifiers, assuming they are incrementing, will sort in that same way as each value is associated with the rows emanating from the inner SELECT. it does not matter if the sequence has gaps in it from other transactions or whatever, only that it increments in the same direction as "num" which we pass in.

In this sequence of events, the INSERT conceptually hasn't even happened for a particular row before the row has been associated with a sequence value. INSERT can do whatever it wants, it can break out into 20 threads and jumble up all the rows or whatever. When it sends them out in RETURNING, whatever order it comes back is no problem; we just want those "mytable_id_seq" values back. We sort by that when we get the rows back. This ordering matches the one we put in. So the way these threads often seem to discuss how INSERTS and RETURNING are both not ordered are missing this particular point, that we really just need a way to relate server generated values to the rows we put in.

This form comes from two places, one is from Microsoft SQL Server which documents as well as guaranteed to maintain order at https://learn.microsoft.com/en-us/sql/t-sql/statements/insert-transact-sql?view=sql-server-ver16; "INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted." ; for PostgreSQL, the 2018 thread above seems to confirm a similar behavior for PostgreSQL.

So we were looking for feedback from PG developers on the above assumptions. Beyond that, throughout all the theoretical talk about "well PG might change this someday", while PG most certainly can and maybe will change some of these assumptions, it would produce a lot of surprises for many users who knowingly or unknowingly have relied on various orderings of this way and it would likely lead to some very good documentation about this area of functionality, maybe even some new syntaxes or keywords to help with this extremely common use case. Documentation for these things is what the various threads on this subject often ask for. The first such thread I can find asking about this, which also ended kind of ambiguously, was in 2012 here: https://www.mail-archive.com/pgsql-hackers(at)postgresql(dot)org/msg204156.html . Then there's the 2018 thread, and also a bunch on stackoverflow. Everyone is left guessing or not totally sure since there's no official documentation for this.

We're going to document in SQLAlchemy the approaches we are using for different backends and I wouldnt be surprised if when people google for "PostgreSQL INSERT RETURNING ORDERING" our own docs are going to come up near the top. That seems to happen a lot.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2023-04-13 00:30:32 Re: FW: Error!
Previous Message Greg Hennessy 2023-04-12 23:37:41 Re: FW: Error!