Skip site navigation (1) Skip section navigation (2)

Re: how to avoid deadlock on masive update with multiples delete

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Anibal David Acosta <aa(at)devshock(dot)com>
Subject: Re: how to avoid deadlock on masive update with multiples delete
Date: 2012-10-05 15:46:05
Message-ID: 10193.1349451965@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-performance
Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On Friday, October 05, 2012 05:31:43 PM Tom Lane wrote:
>> There's no guarantee that the planner won't re-sort the rows coming from
>> the sub-select, unfortunately.

> More often than not you can prevent the planner from doing that by putting a 
> OFFSET 0 in the query. Not 100% but better than nothing.

No, that will accomplish exactly nothing.  The ORDER BY is already an
optimization fence.  The problem is that of the several ways the planner
might choose to join the subquery output to the original table, not all
will produce the join rows in the same order as the subquery's result
is.  For instance, when I tried his example I initially got

 Delete on test  (cost=400.88..692.85 rows=18818 width=34)
   ->  Merge Join  (cost=400.88..692.85 rows=18818 width=34)
         Merge Cond: (test.g = x.g)
         ->  Sort  (cost=135.34..140.19 rows=1940 width=10)
               Sort Key: test.g
               ->  Seq Scan on test  (cost=0.00..29.40 rows=1940 width=10)
         ->  Sort  (cost=265.53..270.38 rows=1940 width=32)
               Sort Key: x.g
               ->  Subquery Scan on x  (cost=135.34..159.59 rows=1940 width=32)
                     ->  Sort  (cost=135.34..140.19 rows=1940 width=10)
                           Sort Key: test_1.ctid
                           ->  Seq Scan on test test_1  (cost=0.00..29.40 rows=1940 width=10)

which is going to do the deletes in "g" order, not ctid order;
and then after an ANALYZE I got

 Delete on test  (cost=90.83..120.58 rows=1000 width=34)
   ->  Hash Join  (cost=90.83..120.58 rows=1000 width=34)
         Hash Cond: (test.g = x.g)
         ->  Seq Scan on test  (cost=0.00..16.00 rows=1000 width=10)
         ->  Hash  (cost=78.33..78.33 rows=1000 width=32)
               ->  Subquery Scan on x  (cost=65.83..78.33 rows=1000 width=32)
                     ->  Sort  (cost=65.83..68.33 rows=1000 width=10)
                           Sort Key: test_1.ctid
                           ->  Seq Scan on test test_1  (cost=0.00..16.00 rows=1000 width=10)

which is going to do the deletes in ctid order, but that's an artifact
of using a seqscan on the test table; the order of the subquery's output
is irrelevant, since it got hashed.

> We really need ORDER BY for DML.

Meh.  That's outside the SQL standard (not only outside the letter of
the standard, but foreign to its very conceptual model) and I don't
think the problem really comes up that often.  Personally, if I had to
deal with this I'd use a plpgsql function (or DO command) that does

	FOR c IN SELECT ctid FROM table WHERE ... ORDER BY ... LOOP
		DELETE FROM table WHERE ctid = c;
	END LOOP;

which is not great but at least it avoids client-to-server traffic.

Having said all that, are we sure this is even a deletion-order
problem?  I was wondering about deadlocks from foreign key references,
for instance.

			regards, tom lane


In response to

Responses

pgsql-performance by date

Next:From: Merlin MoncureDate: 2012-10-05 15:55:39
Subject: Re: how to avoid deadlock on masive update with multiples delete
Previous:From: Andres FreundDate: 2012-10-05 15:36:14
Subject: Re: how to avoid deadlock on masive update with multiples delete

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group