Re: BUG #17590: [OPTIMIZER] DELETE never ends on a small table, found a workaround which makes it instant

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: g(dot)fouet(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, PG Bug reporting form <noreply(at)postgresql(dot)org>
Subject: Re: BUG #17590: [OPTIMIZER] DELETE never ends on a small table, found a workaround which makes it instant
Date: 2022-08-19 16:17:45
Message-ID: 5b0e5bab-08ac-0942-3bad-9b122b513f94@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 8/19/22 17:13, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 17590
> Logged by: Guillaume FOUET
> Email address: g(dot)fouet(at)gmail(dot)com
> PostgreSQL version: 14.5
> Operating system: Windows 10 - 64x
> Description:
>
> Tested with Postgres 13.5, 13.8 and 14.5 (I updated to see if that was a
> bug).
>
> I have four tables: ADDRESS, COMPANY, CONTACT, TRIP
> ADDRESS has ~130000 rows
> COMPANY has a FK toward ADDRESS and ~16000 rows
> CONTACT has a FK toward ADDRESS and ~12000 rows
> TRIP has a FK toward ADDRESS and ~137500 rows
>
> We wanted to purge the address table from old, unused addresses:
> DELETE FROM address WHERE address_id NOT IN (
> SELECT DISTINCT address_id FROM company
> UNION
> SELECT DISTINCT address_id FROM contact
> UNION
> SELECT DISTINCT address_id FROM trip
> );
>
> This query above never ends (I waited 15 minutes and it was still going, HDD
> doing nothing, one core CPU used).
> The EXPLAIN says it materializes the address_ids aggregates then scans
> ADDRESS for deletion.

It's probably better to include the query plan. Anyway, the union
essentially creates a new relation, making indexes (on the base
relations unusable).

>
> After many tries, I made this query instead:
> CREATE TEMPORARY TABLE used_address_id AS (
> SELECT DISTINCT address_id FROM company
> UNION
> SELECT DISTINCT address_id FROM contact
> UNION
> SELECT DISTINCT address_id FROM trip
> );
> CREATE UNIQUE INDEX ON used_address_id (address_id);
> DELETE FROM used_address_id WHERE address_id IS NULL;
> DELETE FROM address WHERE address_id NOT IN (SELECT address_id FROM
> used_address_id);
>
> This was resolved in 500ms (basically instantly).
> I have the feeling there's an O(n²) somewhere in the first query.

Yeah. The temporary table means we can use the indexes again.

I'm not sure I'd call this a bug, it's simply how we deal with this sort
of queries. Maybe try splitting the one "NOT IN" condition into a
separate condition per table. I mean, something like

DELETE FROM address
WHERE address_id NOT IN (SELECT address_id FROM company)
AND address_id NOT IN (SELECT address_id FROM contact)
AND address_id NOT IN (SELECT address_id FROM trip)

or something like that.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Etsuro Fujita 2022-08-21 12:31:35 Re: foreign join error "variable not found in subplan target list"
Previous Message PG Bug reporting form 2022-08-19 15:13:59 BUG #17590: [OPTIMIZER] DELETE never ends on a small table, found a workaround which makes it instant