Re: Advice about how to delete

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: arnaulist(at)andromeiberica(dot)com
Cc: PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Advice about how to delete
Date: 2007-07-06 16:18:53
Message-ID: 468E6B6D.1010601@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Arnau wrote:
> CREATE TABLE user_groups
> (
> user_id INT8 REFERENCES users(user_id),
> group_id INT8 REFERENCE groups(group_id),
> CONSTRAINT pk PRIMARY_KEY ( user_id, group_id)
> )
>
> CREATE INDEX idx_user_id ON user_groups( user_id );

The primary key implicitly creates an index on (user_id, group_id), so
you probably don't need this additional index.

> This works quite fast with small groups but when the group has an
> important number of users, it takes too much time. The delete_group
> action is fired from the user interface of the application.

It looks like you're not deleting rows from user_groups when a group is
deleted. Perhaps the table definition you posted misses ON DELETE
CASCADE on the foreign key declarations?

I would implement this with triggers. Use the ON DELETE CASCADE to take
care of deleting rows from user_groups and create an ON DELETE trigger
on user_groups to delete orphan rows. Like this:

CREATE OR REPLACE FUNCTION delete_orphan_users () RETURNS trigger AS $$
DECLARE
BEGIN
PERFORM * FROM user_groups ug WHERE ug.user_id = OLD.user_id;
IF NOT FOUND THEN
DELETE FROM users WHERE users.user_id = OLD.user_id;
END IF;

RETURN NULL;
END;
$$ LANGUAGE 'plpgsql';

DROP TRIGGER IF EXISTS d_usergroup ON user_groups;
CREATE TRIGGER d_usergroup AFTER DELETE ON user_groups FOR EACH ROW
EXECUTE PROCEDURE delete_orphan_users();

This might not be significantly faster, but it's easier to work with.

> Do you have any idea about how I could improve the performance of this?

Michael Glaesemann's idea of using a single statement to delete all
orphan users with one statement is a good one, though you want to refine
it a bit so that you don't need to do a full table scan every time.
Perhaps like this, before deleting rows from user_groups:

DELETE FROM users WHERE user_id IN (
SELECT u.user_id FROM users u
LEFT OUTER JOIN user_groups ug ON (u.user_id = ug.user_id AND
ug.group_id <> 10)
WHERE group_id IS NULL
AND u.user_id IN (SELECT user_id FROM user_groups where group_id = 10)
);

Or maybe you could just leave the orphans in the table, and delete them
later in batch?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Thomas Finneid 2007-07-08 10:06:29 Re: improvement suggestions for performance design
Previous Message Jignesh K. Shah 2007-07-06 16:03:12 Re: Direct I/O