Re: DELETE with LIMIT (or my first hack)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Daniel Loureiro <daniel(at)termasa(dot)com(dot)br>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Csaba Nagy <ncslists(at)googlemail(dot)com>
Subject: Re: DELETE with LIMIT (or my first hack)
Date: 2010-11-30 21:08:25
Message-ID: 4015.1291151305@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> writes:
> While reading this thread, I thought of two things I think we could do
> if this feature was implemented:

> 1. Sort large UPDATE/DELETEs so it is done in heap order

> This is actually a TODO item. I imagine it would be possible to do
> something like:
> DELETE FROM foo USING (...) ORDER BY ctid;
> with this patch to help this case.

Well, that's strictly an implementation detail; it is not a reason to
expose ORDER BY to the user, and even less of a reason to invent LIMIT.
It also hasn't got any of the problems we were discussing with
inheritance situations, since it'd be perfectly OK (in fact probably
desirable) to sort each table's rows separately.

> 2. Reducing deadlocks in big UPDATE/DELETEs

> One problem that sometimes occurs when doing multiple multi-row UPDATEs
> or DELETEs concurrently is that the transactions end up working on the
> same rows, but in a different order. One could use an ORDER BY clause
> to make sure the transactions don't deadlock.

That, on the other hand, seems like potentially a valid use-case. Note
that the user-given order would have to override any internal attempt to
order by ctid for this to be usable.

I had thought of a slightly different application, which could be
summarized with this example:

UPDATE sometab SET somecol = nextval('seq') ORDER BY id;

with the expectation that somecol's values would then fall in the same
order as the id column. Unfortunately, that won't actually *work*
reliably, the reason being that ORDER BY is applied after targetlist
computation. I think enough people would get burnt this way that we'd
have popular demand to make ORDER BY work differently in UPDATE than it
does in SELECT, which seems rather ugly not only from the definitional
side but the implementation side.

(DELETE escapes this issue because it has no user-definable elements in
its targetlist, which is another way that DELETE is simpler here.)

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-11-30 21:09:40 Re: KNNGIST next step: adjusting indexAM API
Previous Message Josh Berkus 2010-11-30 20:58:09 Re: Spread checkpoint sync