Re: Revisiting UPDATE FROM ... ORDER BY not respected

From: Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Revisiting UPDATE FROM ... ORDER BY not respected
Date: 2011-04-13 04:10:54
Message-ID: BLU0-SMTP488332F93CDFF508349E3296AA0@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks Tom,

>> I don't think this really works if multiple processes try to update the
table concurrently --- does that ever happen in your apps?
<<

Technically possible, but the production reality makes it unlikely.
Operationally, it makes no sense for it to be run more than once, or by more
than one person or against the same table as it represnts a data import
(will only be run once to prepare the data). I will ask production about
this to be sure.

Like I said, I am baffled

>> What PG versions are they using? Pre-8.2 this couldn't have been relied
on at all, because the planner would still try to flatten the subselect.
<<

In this case, 8.3.

>> I rather wonder why you're insisting on sequential assignments
at all.
<<

The relation "my_schema.my_table" is actually a table that represents a set
of operations to a data import process. The order of processing is
represented by the field "order_by_value". Those values come from an
external source and are not synced with our domain set (i.e. our sequences
for generating primary key values) and of course are likely to conflict.

All import tables must get an import id - represented by the field "id". The
import process expects this field to be filled and unique so that it can be
related to the data warehouse asset management and operation auditing
systems.

Thanks,

Carlo

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: April 12, 2011 8:15 PM
To: Carlo Stonebanks
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] Revisiting UPDATE FROM ... ORDER BY not respected

Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca> writes:
> A few years ago I asked about creating a single UPDATE statement to assign
> id's from a sequence, with the sequences applied in a particular order. In
> other words, order the table, then apply nextval-generated id's to the id
> field in question.
> Here is the original post:
> http://archives.postgresql.org/pgsql-general/2009-04/msg01123.php
> The explanation and solution provided made total sense. To my surprise,
> every now and then users still report sequences being applied out of
order.

What PG versions are they using? Pre-8.2 this couldn't have been relied
on at all, because the planner would still try to flatten the subselect.

> Below is the code in question. I admit that I am completely baffled. Does
> anyone have any clues?

I don't think this really works if multiple processes try to update the
table concurrently --- does that ever happen in your apps? Two
processes starting to execute this query at about the same time would
assign distinct sets of IDs for the same (or at least overlapping) sets
of rows, and then only one of them would actually get to update any
particular row. Depending on chances of timing, that could easily
result in an out-of-order set of updates reaching commit.

Another thought is that in READ COMMITTED mode, you're at risk of extra
evaluations of nextval(), because the query tree will be re-evaluated
after waiting out a conflicting update to a target row. The query as
given seems safe against that effect when competing against other
occurrences of itself because of the "WHERE my_table.id IS NULL" bit ---
if somebody else beats you to a row update, the newer nextval value will
just get dropped on the floor (if it's even generated at all, which it
might not be). However, if there are ever concurrent updates to the
target row that don't change the id column to non-null, you'd possibly
have a problem from re-evaluations of nextval().

You could probably defend against both of those effects by taking a
table lock that prevents other updates while you do this. On the whole,
though, I rather wonder why you're insisting on sequential assignments
at all. If this operation can be done on a whim by independent
processes then sequentiality is likely to be approximate at best anyway.

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2011-04-13 05:01:55 Re: 9.0 Out of memory
Previous Message Jeremy Palmer 2011-04-13 04:02:20 9.0 Out of memory