Re: how to avoid deadlock on masive update with multiples delete

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-performance(at)postgresql(dot)org, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Anibal David Acosta <aa(at)devshock(dot)com>
Subject: Re: how to avoid deadlock on masive update with multiples delete
Date: 2012-10-05 15:55:39
Message-ID: CAHyXU0yZ+qvuE_erWx86GSzTE6KTH3rtc-KEyDgJaTYM5Smx3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Oct 5, 2012 at 10:46 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> On Friday, October 05, 2012 05:31:43 PM Tom Lane wrote:
>>> There's no guarantee that the planner won't re-sort the rows coming from
>>> the sub-select, unfortunately.
>
>> More often than not you can prevent the planner from doing that by putting a
>> OFFSET 0 in the query. Not 100% but better than nothing.
>
> No, that will accomplish exactly nothing. The ORDER BY is already an
> optimization fence. The problem is that of the several ways the planner
> might choose to join the subquery output to the original table, not all
> will produce the join rows in the same order as the subquery's result
> is. For instance, when I tried his example I initially got
>
> Delete on test (cost=400.88..692.85 rows=18818 width=34)
> -> Merge Join (cost=400.88..692.85 rows=18818 width=34)
> Merge Cond: (test.g = x.g)
> -> Sort (cost=135.34..140.19 rows=1940 width=10)
> Sort Key: test.g
> -> Seq Scan on test (cost=0.00..29.40 rows=1940 width=10)
> -> Sort (cost=265.53..270.38 rows=1940 width=32)
> Sort Key: x.g
> -> Subquery Scan on x (cost=135.34..159.59 rows=1940 width=32)
> -> Sort (cost=135.34..140.19 rows=1940 width=10)
> Sort Key: test_1.ctid
> -> Seq Scan on test test_1 (cost=0.00..29.40 rows=1940 width=10)
>
> which is going to do the deletes in "g" order, not ctid order;
> and then after an ANALYZE I got
>
> Delete on test (cost=90.83..120.58 rows=1000 width=34)
> -> Hash Join (cost=90.83..120.58 rows=1000 width=34)
> Hash Cond: (test.g = x.g)
> -> Seq Scan on test (cost=0.00..16.00 rows=1000 width=10)
> -> Hash (cost=78.33..78.33 rows=1000 width=32)
> -> Subquery Scan on x (cost=65.83..78.33 rows=1000 width=32)
> -> Sort (cost=65.83..68.33 rows=1000 width=10)
> Sort Key: test_1.ctid
> -> Seq Scan on test test_1 (cost=0.00..16.00 rows=1000 width=10)
>
> which is going to do the deletes in ctid order, but that's an artifact
> of using a seqscan on the test table; the order of the subquery's output
> is irrelevant, since it got hashed.
>
>> We really need ORDER BY for DML.
>
> Meh. That's outside the SQL standard (not only outside the letter of
> the standard, but foreign to its very conceptual model) and I don't
> think the problem really comes up that often. Personally, if I had to
> deal with this I'd use a plpgsql function (or DO command) that does

Can't it be forced like this (assuming it is in fact a vanilla order
by problem)?

EXPLAIN DELETE FROM test USING (SELECT g FROM test ORDER BY
ctid FOR UPDATE) x where x.g = test.g;

(emphasis on 'for update')

merlin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Claudio Freire 2012-10-05 15:58:52 Re: how to avoid deadlock on masive update with multiples delete
Previous Message Tom Lane 2012-10-05 15:46:05 Re: how to avoid deadlock on masive update with multiples delete