Re: Delete query takes exorbitant amount of time

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Karim A Nassar <Karim(dot)Nassar(at)NAU(dot)EDU>, Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Delete query takes exorbitant amount of time
Date: 2005-03-29 11:29:36
Message-ID: 1112095776.11750.958.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, 2005-03-29 at 01:48 -0700, Karim A Nassar wrote:
> > Each value has 1/13th of the table, which is too many rows per value to
> > make an IndexScan an efficient way of deleting rows from the table.
>
> But, the original question was that the delete that was taking a long time
> was on a different table. I tried to delete 150 rows from a table with 750
> rows, which is FK referenced from this large table. If I understand
> correctly, Tom suggested that the length of time was due to a sequential
> scan being done on the large table for each value being deleted from the
> small one.

> For this FK check, there only need be one referring id to invalidate the
> delete. ISTM that for any delete with a FK reference, the index could
> always be used to search for a single value in the referring table
> (excepting very small tables). Why then must a sequential scan be
> performed in this case, and/or in general?

My understanding was that you were doing a DELETE on the smaller table
and that this was doing a DELETE on the measurement table because you
had the FK defined as ON DELETE CASCADE. You are right - only a single
row is sufficient to RESTRICT the DELETE. But for an ON UPDATE/DELETE
action of CASCADE then you will want to touch all rows referenced, so a
SeqScan is a perfectly valid consequence of such actions.
I think now that you are using the default action, rather than
specifically requesting CASCADE?

Stephan, Tom:
The SQL generated for RI checking by the RI triggers currently applies a
limit at execution time, not at prepare time. i.e. there is no LIMIT
clause in the SQL.

We know whether the check will be limit 1 or limit 0 at prepare time, so
why not add a LIMIT clause to the SQL so it changes the plan, not just
the number of rows returned when the check query executes?
(I note that PREPARE does allow you to supply a LIMIT 1 clause).

That is *ought* to have some effect on the plan used by the RI check
queries. In costsize.c:cost_index we would have tuples_fetched==1 and it
would be hard (but not impossible) for the index cost to ever be more
than the cost of a SeqScan.

...but, I see no way for OidFunctionCall8 to ever return an answer of
"always just 1 row, no matter how big the relation"...so tuples_fetched
is always proportional to the size of the relation. Are unique indexes
treated just as very-low-selectivity indexes? - they're a very similar
situation in terms of forcing an absolute, not relative, number of rows
returned.

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dave Cramer 2005-03-29 12:15:04 Re: How to improve db performance with $7K?
Previous Message AL ELK 2005-03-29 11:21:13 coalesce alternative