Re: BUG #15715: UPDATE using "in (subquery for update limit 1)" does not respect the limit in subquery

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Lars Vonk <lars(dot)vonk(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15715: UPDATE using "in (subquery for update limit 1)" does not respect the limit in subquery
Date: 2019-03-27 14:14:21
Message-ID: 17074.1553696061@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Lars Vonk <lars(dot)vonk(at)gmail(dot)com> writes:
> So what I understand so far is that allthough the query is not garantueed
> to return the same single row, it is not possible it returned and updated
> multiple rows in a single execution?

The LIMIT node will absolutely not return more than one row per execution
of the subquery. However ... what you're asking about is how many rows
can get locked, which is a slightly different question.

> I ran the query again with EXPLAIN ANALYZE (but I am unsure if this means
> anything since we do not have the exact EXPLAIN of that particular query):

AFAICS, this plan shape wouldn't have the issue because the subquery is
on the outside of a nestloop and so would only get run once. What I'm
supposing is that you get trouble if the planner decides to go with a
nestloop semijoin (with the IN subquery on the inside of that loop).

Now either way, the plan tree for the subquery itself ought to look
like what you have here:

> -> Limit (cost=146.51..146.52 rows=1 width=22) (actual time=1.033..1.034 rows=0 loops=1)
> -> LockRows (cost=146.51..146.52 rows=1 width=22) (actual time=1.033..1.033 rows=0 loops=1)
> -> Sort (cost=146.51..146.51 rows=1 width=22) (actual time=1.033..1.033 rows=0 loops=1)
> Sort Key: delayed_jobs_1.priority, delayed_jobs_1.run_at
> Sort Method: quicksort Memory: 25kB -> Seq Scan on delayed_jobs
> delayed_jobs_1 (cost=0.00..146.50 rows=1 width=22) (actual time=1.010..1.010 rows=0 loops=1)
> Filter: ((failed_at IS NULL) AND ((queue)::text = 'workflows'::text) AND (((run_at <= '2019-03-26 13:25:22.208747'::timestamp without time zone) AND ((locked_at IS NULL) OR (locked_at < '2019-03-26 09:25:22.20877'::timestamp without time zone))) OR ((locked_by)::text = 'host:job01.prod.jortt.nl pid:10029'::text)))
> Rows Removed by Filter: 160

The seqscan+sort is going to find and return all the rows that meet
that "filter" condition *as of the start of the query*. The LockRows
node is going to take the first of those and lock it, which will include
finding and locking any newer version of the row that exists due to a
concurrent update. If there is a newer version, it then rechecks whether
that version still satisfies the filter condition (via some magic we
needn't get into here). If so, it returns the row to the LIMIT node,
which returns it up and then declares it's done, so we have found and
locked exactly one row. However, if that first row has been updated
to a state that *doesn't* satisfy the filter condition, the LockRows
node will advance to the next row of the seqscan+sort output, and lock
and retest that one. This repeats till it finds a row that does still
satisfy the filter condition post-locking.

So it's fairly easy to see how concurrent updates could cause this
query to lock N rows, for some N larger than one. But by itself
this isn't a very satisfactory explanation for the query locking
*all* the rows as you state happened. All of them would've had
to be concurrently updated to states that no longer satisfy the
filter condition, and that seems pretty unlikely to happen as a
consequence of a few other transactions individually doing the same
type of query.

Perhaps that could happen if the outer UPDATE were itself updating the row
to no longer satisfy the filter condition, so that the next iteration of
the subquery then skipped over it. But (a) your example doesn't seem to
do that, and (b) if it did happen like that then the symptoms would not
just be that the rows were locked, but that they were all updated as well.
(Or is that what you meant? You haven't been clear about what led you to
conclude that all the rows got locked.)

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-03-27 15:30:27 BUG #15717: Index not used when ordering by left joined table column
Previous Message Dean Rasheed 2019-03-27 12:46:29 Re: BUG #15708: RLS 'using' running as wrong user when called from a view