Re: Deleting takes days, should I add some index?

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Deleting takes days, should I add some index?
Date: 2021-02-25 20:48:51
Message-ID: CAApHDvrC3iVcfjU5-bKQfE-Hxy7_Jr24mv18EzMcqUZLAu0qGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 26 Feb 2021 at 02:06, Alexander Farber
<alexander(dot)farber(at)gmail(dot)com> wrote:
> However the deletion still takes forever and I have to ctrl-c it:
>
> # delete from words_games where created < now() - interval '12 month';
>
> Do you please have any further suggestions?
>
> When I try to prepend "explain analyze" to the above query, then in the production database it also lasts forever.

EXPLAIN with ANALYZE executes the query. So it'll probably to take just as long.

Since your foreign keys perform a cascade delete on the tables
referencing the tables you're deleting from, any records in those
referencing tables will be deleted too. You must also look at those
referencing tables and see what references those and index the
column(s) which are referencing.

Here's a simplified example that's easier to understand than your case.

Setup:
create table t1 (id int primary key);
create table t2 (id int primary key, t1_id int not null references t1
on update cascade on delete cascade);
create index on t2 (t1_id);
create table t3 (id int primary key, t2_id int not null references t2
on update cascade on delete cascade);

So I have 2 levels of reference. t2 -> t1 and t3 -> t2.
If I remove a row from t1 then PostgreSQL must perform: DELETE FROM t2
WHERE t1_id = <id value of t1 row deleted>;

Luckily I indexed t2(t1_id), so that should be fast.

Since t3 references t2, the database must also perform: DELETE FROM t3
WHERE t2_id = <id value of t2 row deleted>; for the row that gets
removed from t2.

Unfortunately, I forgot to index t3(t2_id).

Let me insert some data and see how the lack of index effects performance:

insert into t1 select x from generate_Series(1,1000000) x;
insert into t2 select x,x from generate_Series(1,1000000) x;
insert into t3 select x,x from generate_Series(1,1000000) x;

Delete 100 records.

delete from t1 where id <= 100;
DELETE 100
Time: 8048.975 ms (00:08.049)

Pretty slow.

create index on t3 (t2_id);
CREATE INDEX

(truncate t1 cascade and reinsert the data)

delete from t1 where id <= 100;
DELETE 100
Time: 5.151 ms

Better.

So, you need to follow each of the "Referenced by" from the table
you're deleting from. In the \d output, just ignore the tables
mentioned in "Foreign-key constraints:". Those are only checked on
INSERT/UPDATE and must already contain a proper unique constraint and
therefore index.

David

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Valeria Kaplan 2021-02-25 20:49:14 Re: Code of Conduct: Hebrew Translation for Review
Previous Message Rumpi Gravenstein 2021-02-25 19:52:20 Re: Postgres Analog of Oracle APPEND hint