Best way to delete unreferenced rows?

From: "Tyrrill, Ed" <tyrrill_ed(at)emc(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Best way to delete unreferenced rows?
Date: 2007-06-07 20:02:55
Message-ID: A23190A408F7094FAF446C1538222F7604092F3B@avaexch01.avamar.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hey All,

I have a table, let's call it A, whose primary key, a_id, is referenced
in a second table, let's call it B. For each unique A.a_id there are
generally many rows in B with the same a_id. My problem is that I want
to delete a row in A when the last row in B that references it is
deleted. Right now I just query for rows in A that aren't referenced by
B, and that worked great when the tables were small, but it takes over
an hour now that the tables have grown larger (over 200 million rows in
B and 14 million in A). The delete has to do a sequential scan of both
tables since I'm looking for what's not in the indexes.

I was going to try creating a trigger after delete on B for each row to
check for more rows in B with the same a_id, and delete the row in A if
none found. In general I will be deleting 10's of millions of rows from
B and 100's of thousands of rows from A on a daily basis. What do you
think? Does anyone have any other suggestions on different ways to
approach this?

Thanks,
Ed

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dan Harris 2007-06-07 21:26:56 Re: reclaiming disk space after major updates
Previous Message Greg Smith 2007-06-07 19:42:41 Re: VERY slow queries at random