Re: MOVE

From: PFC <lists(at)boutiquenumerique(dot)com>
To: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Cc: "Postgres general mailing list" <pgsql-general(at)postgresql(dot)org>
Subject: Re: MOVE
Date: 2005-01-15 00:07:47
Message-ID: opskmam9mtth1vuj@musicbox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


> BEGIN;
> INSERT INTO archive (...) SELECT ... FROM active WHERE user_id = ...;
> DELETE FROM active WHERE user_id = ...;
> COMMIT;
>
> The DELETE can only delete the rows returned by the select, that's the
> whole point of transactions...

Well, in the case of having a unique index on user_id, and if no-one
updates the row between the insert and the delete, it will work ;)
And if someone updates it in between, well, the update is not archived,
so you have to LOCK your row FOR UPDATE.
And if this procedure is called twice at the same time, the rows will be
historized twice...
etc...

Which is precisely why I don't like this approach !

As a side note, I've installed 8.0 rc, and wow. The slow queries feel a
lot faster on the command prompt, my small queries have became faster
too... very good work !

In the end, I've implemented it with an AFTER DELETE trigger on the
'live' table, which, after the row has been deleted, inserts it in the
history table, using the magic variable OLD. This will work because the
row is already deleted, thus can't be concurrently updated by another
transaction (because a transaction trying to update a row will wait on the
lock acquired by the DELETE, and vice versa).

So, for ONE row at a time, in a trigger, it works beautifully, thank you
postgres ! I find the solution very elegant : when a session expires, it
is deleted from the live session table, then the trigger catches it just
in time to shove it in the sessions history table, then some other tables
like user-to-chatroom connections, which happen to have a user_id
referencing into the live sessions table, get the ON DELETE CASCADE and
are also purged and historized automatically. I am very happy with this
solution BUT it's done one-row-at-a-time, so it's slower than I'd like !

The key is to insert the row deleted from the live table into the history
table AFTER it has been deleted, to avoid all funky locks problems. Now
consider the following : you must DELETE several items at the same time
and historize them.

If you INSERT then DELETE:
- records can be inserted, updated or deleted between the two. The
inserted ones will be historized but not deleted (duplicates !), the
deleted ones will be lost forever, unhistorized, the updated ones won't
have their updates historized. Not very well for concurrecy !

You can LOCK FOR UPDATE before, this solves the UPDATE and DELETE
problem, but not the INSERT problem.
You can, of course, lock the entire table, but well, it reminds me too
much of the MySQL way.
You can also use SERIALIZABLE mode which solves all the problems, but if
something goes wrong, everything fails and you have to retry the whole
trasaction, whereas a proper lock would be waited on...
If there is a primary key in the 'live' table you can SELECT FOR UPDATE
into a tamporary table, then delete using the pkeys in the temp table,
then insert from the temp table... ugly !

That's why I bother you to have the possibility of DELETE returning the
DELETE'd rows ;)

It's not very useful if you process one row, but when you process several
at a time, it would be really great, because instead of 2*N queries
(DELETE+INSERT hidden in a trigger) you'd just do one (INSERT ... DELETE
AND SELECT ... FROM ...). Today, if you don't want to do it in a trigger,
you have to have a unique index, SELECT FOR UPDATE, INSERT, DELETE, that's
three queries per row.
In a perfect world, this would be then used in an ON DELETE RULE which
would replace the DELETES by deletes inserting the rows into the history
table

Also I've thought about some other interesting applications, if DELETE
returns rows, why not UPDATE or even INSERT ?
Many applications use INSERT... then SELECT currval(sequence). I also
like to set defaults in the database, like for instance some rows which
have timestamp fields defaulting to now() or things like that. I have a
tree table with a ltree field which is generated by a trigger from the
parent's path and the current row's id. Some other fields are also
inherited from the parent. Why not do INSERT INTO ... AND SELECT ... which
would return the sequence field, and any other fields which have been
initialized by ON INSERT triggers... this would be neat... instead of
INSERT, SELECT currval, SELECT .. FROM table WHERE id=...
Same thing for on update triggers.
You could replace some plpgsql procedures with one query, and what's more
important, not worry about locking headaches.

Anyway, my problem is solved now with triggers, but I like the idea very
much (and Oracle has it) (and Tom once said a DELETE was just more or less
like a SELECT)... so ...

Regards

In response to

  • Re: MOVE at 2005-01-14 21:28:57 from Martijn van Oosterhout

Browse pgsql-general by date

  From Date Subject
Next Message PFC 2005-01-15 00:08:47 Re: MOVE
Previous Message Wes 2005-01-14 23:57:21 Re: [HACKERS] Much Ado About COUNT(*)

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2005-01-15 00:54:24 Re: sparse (static analyzer) report
Previous Message Mark Wong 2005-01-14 23:53:09 Re: sparse (static analyzer) report