Re: BUG #15715: UPDATE using "in (subquery for update limit 1)" does not respect the limit in subquery

From: Lars Vonk <lars(dot)vonk(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15715: UPDATE using "in (subquery for update limit 1)" does not respect the limit in subquery
Date: 2019-03-27 07:22:39
Message-ID: CAMX1Thju8rO1cqbZPq=qj7+uENmssPw4SEBC2tzQxcUQ8AEDag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> And have you actually seen any such failures?

No not yet. And probably won't, given your explanation.

So what I understand so far is that allthough the query is not garantueed
to return the same single row, it is not possible it returned and updated
multiple rows in a single execution?

> Without seeing an
> EXPLAIN for this query on your system, we can't know whether it's being
> done like that

I ran the query again with EXPLAIN ANALYZE (but I am unsure if this means
anything since we do not have the exact EXPLAIN of that particular query):

Update on delayed_jobs (cost=146.81..154.85 rows=1 width=758) (actual
time=1.035..1.035 rows=0 loops=1)
-> Nested Loop (cost=146.81..154.85 rows=1 width=758) (actual
time=1.035..1.035 rows=0 loops=1)
-> HashAggregate (cost=146.54..146.55 rows=1 width=32) (actual
time=1.035..1.035 rows=0 loops=1)
Group Key: "ANY_subquery".id
-> Subquery Scan on "ANY_subquery" (cost=146.51..146.53
rows=1 width=32) (actual time=1.034..1.034 rows=0 loops=1)
-> Limit (cost=146.51..146.52 rows=1 width=22)
(actual time=1.033..1.034 rows=0 loops=1)
-> LockRows (cost=146.51..146.52 rows=1
width=22) (actual time=1.033..1.033 rows=0 loops=1)
-> Sort (cost=146.51..146.51 rows=1
width=22) (actual time=1.033..1.033 rows=0 loops=1)
Sort Key: delayed_jobs_1.priority,
delayed_jobs_1.run_at
Sort Method: quicksort Memory: 25kB
-> Seq Scan on delayed_jobs
delayed_jobs_1 (cost=0.00..146.50 rows=1 width=22) (actual
time=1.010..1.010 rows=0 loops=1)
Filter: ((failed_at IS NULL)
AND ((queue)::text = 'workflows'::text) AND (((run_at <= '2019-03-26
13:25:22.208747'::timestamp without time zone) AND ((locked_at IS NULL) OR
(locked_at < '2019-03-26 09:25:22.20877'::timestamp without time zone))) OR
((locked_by)::text = 'host:job01.prod.jortt.nl pid:10029'::text)))
Rows Removed by Filter: 160
-> Index Scan using delayed_jobs_pkey on delayed_jobs
(cost=0.28..8.29 rows=1 width=206) (never executed)
Index Cond: (id = "ANY_subquery".id)
Planning Time: 0.367 ms
Execution Time: 1.076 ms
(17 rows)

> Either way, a WITH is probably preferable because that does get you
> into a place where we guarantee single-evaluation semantics.

Will do.

Thanks again for your time and patience to explain.

Lars

On Tue, Mar 26, 2019 at 11:53 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Lars Vonk <lars(dot)vonk(at)gmail(dot)com> writes:
> > Thanks for the explanation. Our hotfix was to replace the IN with a = so
> > the query fails and the transaction is rolled back when more than one is
> > returned from the sub-SELECT.
>
> And have you actually seen any such failures? If you have, then there
> is certainly something going wrong here. The LIMIT should never allow
> more than one row to be returned by the sub-SELECT, period, full stop.
>
> The issue here is that the sub-SELECT is, in principle, executed afresh
> for every row of the outer query, and you do not have a guarantee that
> each such execution returns the *same* single row. Ordinary query
> execution would provide such a guarantee, but you're using FOR UPDATE
> on a table that's being modified concurrently (including by this query
> itself), and therefore the guarantee disappears.
>
> You might think that since the sub-SELECT is uncorrelated with the outer
> query, there's no need to execute it more than once ... but that's an
> optimization, not part of the guaranteed semantics. Without seeing an
> EXPLAIN for this query on your system, we can't know whether it's being
> done like that (though the fact that you're complaining suggests that
> it isn't).
>
> Interestingly, it's quite likely that your "hotfix" made the problem
> go away, because PG generally *does* optimize uncorrelated sub-SELECTs
> to be executed only once if they return scalar results. That doesn't
> happen for IN subqueries though, as those are treated as joins.
>
> Either way, a WITH is probably preferable because that does get you
> into a place where we guarantee single-evaluation semantics.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Dean Rasheed 2019-03-27 12:46:29 Re: BUG #15708: RLS 'using' running as wrong user when called from a view
Previous Message Michael Paquier 2019-03-27 02:56:20 Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table