Re: BUG #15715: UPDATE using "in (subquery for update limit 1)" does not respect the limit in subquery

From: Lars Vonk <lars(dot)vonk(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15715: UPDATE using "in (subquery for update limit 1)" does not respect the limit in subquery
Date: 2019-03-27 16:17:51
Message-ID: CAMX1ThjgrpaU6NkH7KTnbykOwKxuX5vD3HJ8kGkH_RYj++3GBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> The LIMIT node will absolutely not return more than one row per execution
> of the subquery. However ... what you're asking about is how many rows
> can get locked, which is a slightly different question.

> But (a) your example doesn't seem to
> do that, and (b) if it did happen like that then the symptoms would not
> just be that the rows were locked, but that they were all updated as well.
> (Or is that what you meant? You haven't been clear about what led you to
> conclude that all the rows got locked.)

My apologies for my unclear report. I meant *updated*. I guess since (a) I
kind of assumed (oh dear) that the locking and the updating is either 1 or
none and (b) it updates a locked_at column made me mix up the terminology.

So to be clear: This query caused that more than one row (actually > 1000s)
*got updated*, I don't know if they also got locked. We believe this
happened in that one single query. Especially because the query took ~ 82
seconds or so to complete.

So is that possible given that query?

Lars

On Wed, Mar 27, 2019 at 3:14 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Lars Vonk <lars(dot)vonk(at)gmail(dot)com> writes:
> > So what I understand so far is that allthough the query is not garantueed
> > to return the same single row, it is not possible it returned and updated
> > multiple rows in a single execution?
>
> The LIMIT node will absolutely not return more than one row per execution
> of the subquery. However ... what you're asking about is how many rows
> can get locked, which is a slightly different question.
>
> > I ran the query again with EXPLAIN ANALYZE (but I am unsure if this means
> > anything since we do not have the exact EXPLAIN of that particular
> query):
>
> AFAICS, this plan shape wouldn't have the issue because the subquery is
> on the outside of a nestloop and so would only get run once. What I'm
> supposing is that you get trouble if the planner decides to go with a
> nestloop semijoin (with the IN subquery on the inside of that loop).
>
> Now either way, the plan tree for the subquery itself ought to look
> like what you have here:
>
> > -> Limit (cost=146.51..146.52 rows=1 width=22)
> (actual time=1.033..1.034 rows=0 loops=1)
> > -> LockRows (cost=146.51..146.52 rows=1
> width=22) (actual time=1.033..1.033 rows=0 loops=1)
> > -> Sort (cost=146.51..146.51 rows=1
> width=22) (actual time=1.033..1.033 rows=0 loops=1)
> > Sort Key:
> delayed_jobs_1.priority, delayed_jobs_1.run_at
> > Sort Method: quicksort Memory:
> 25kB -> Seq Scan on delayed_jobs
> > delayed_jobs_1 (cost=0.00..146.50 rows=1 width=22) (actual
> time=1.010..1.010 rows=0 loops=1)
> > Filter: ((failed_at IS
> NULL) AND ((queue)::text = 'workflows'::text) AND (((run_at <= '2019-03-26
> 13:25:22.208747'::timestamp without time zone) AND ((locked_at IS NULL) OR
> (locked_at < '2019-03-26 09:25:22.20877'::timestamp without time zone))) OR
> ((locked_by)::text = 'host:job01.prod.jortt.nl pid:10029'::text)))
> > Rows Removed by Filter: 160
>
> The seqscan+sort is going to find and return all the rows that meet
> that "filter" condition *as of the start of the query*. The LockRows
> node is going to take the first of those and lock it, which will include
> finding and locking any newer version of the row that exists due to a
> concurrent update. If there is a newer version, it then rechecks whether
> that version still satisfies the filter condition (via some magic we
> needn't get into here). If so, it returns the row to the LIMIT node,
> which returns it up and then declares it's done, so we have found and
> locked exactly one row. However, if that first row has been updated
> to a state that *doesn't* satisfy the filter condition, the LockRows
> node will advance to the next row of the seqscan+sort output, and lock
> and retest that one. This repeats till it finds a row that does still
> satisfy the filter condition post-locking.
>
> So it's fairly easy to see how concurrent updates could cause this
> query to lock N rows, for some N larger than one. But by itself
> this isn't a very satisfactory explanation for the query locking
> *all* the rows as you state happened. All of them would've had
> to be concurrently updated to states that no longer satisfy the
> filter condition, and that seems pretty unlikely to happen as a
> consequence of a few other transactions individually doing the same
> type of query.
>
> Perhaps that could happen if the outer UPDATE were itself updating the row
> to no longer satisfy the filter condition, so that the next iteration of
> the subquery then skipped over it. But (a) your example doesn't seem to
> do that, and (b) if it did happen like that then the symptoms would not
> just be that the rows were locked, but that they were all updated as well.
> (Or is that what you meant? You haven't been clear about what led you to
> conclude that all the rows got locked.)
>
> regards, tom lane
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-03-28 01:19:22 Re: BUG #15717: Index not used when ordering by left joined table column
Previous Message PG Bug reporting form 2019-03-27 15:30:27 BUG #15717: Index not used when ordering by left joined table column