Re: how to speed up query

From: Erwin Brandstetter <brsaweda(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: how to speed up query
Date: 2007-06-15 06:03:07
Message-ID: 1181887387.725496.174190@p77g2000hsh.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Jun 13, 3:13 pm, "Andrus" <kobrule(dot)(dot)(dot)(at)hot(dot)ee> wrote:
(...)
> As I understand, only way to optimize the statement
>
> delete from firma1.rid where dokumnr not in (select dokumnr from
> firma1.dok);
>
> assuming that firma1.dok.dokumnr does not contain null values is to change
> it to
>
> CREATE TEMP TABLE mydel AS
> SELECT r.dokumnr
> FROM rid r
> LEFT JOIN dok d USING (dokumnr)
> WHERE d.dokumnr IS NULL;
> DELETE FROM rid USING mydel WHERE rid.dokumnr =mydel.dokumnr;
> drop table mydel;

As I mentioned when I proposed it, the temp table may not even be
necessary. The important part is the LEFT JOIN instead of the NOT IN
(as Martijn has explained).
You could try the direct approach ...

DELETE FROM rid
USING ( SELECT r.dokumnr
FROM rid r
LEFT JOIN dok d USING (dokumnr)
WHERE d.dokumnr IS NULL) x
WHERE rid.dokumnr = x.dokumnr;

... and see which runs faster. Probably it does not make much of a
difference.

If the temp table works for you, you might be interested in a new
feature of 8.2: CREATE TEMP TABLE AS ...ON COMMIT DROP;
http://www.postgresql.org/docs/current/static/sql-createtableas.html

(...)

> explain analyze SELECT r.dokumnr
> FROM rid r
> LEFT JOIN dok d USING (dokumnr)
> WHERE d.dokumnr IS NULL
>
> returns
>
> "Hash Left Join (cost=7760.27..31738.02 rows=1 width=4) (actual
> time=2520.904..2520.904 rows=0 loops=1)"
> " Hash Cond: (r.dokumnr = d.dokumnr)"
> " Filter: (d.dokumnr IS NULL)"
> " -> Seq Scan on rid r (cost=0.00..17424.24 rows=202424 width=4) (actual
> time=0.032..352.225 rows=202421 loops=1)"
> " -> Hash (cost=6785.01..6785.01 rows=56101 width=4) (actual
> time=211.150..211.150 rows=56079 loops=1)"
> " -> Seq Scan on dok d (cost=0.00..6785.01 rows=56101 width=4)
> (actual time=0.021..147.805 rows=56079 loops=1)"
> "Total runtime: 2521.091 ms"

If the indices are present (and visible) at the time of execution, as
you described it, we should be seeing index scans on dok_dokumnr_idx
and rid_dokumnr_idx instead of sequential scans.

That's what I get on a similar query in one of my databases:
EXPLAIN ANALYZE SELECT a.adr_id FROM cp.adr a LEFT JOIN cp.kontakt k
USING (adr_id) WHERE k.adr_id IS NULL;

Merge Left Join (cost=0.00..1356.31 rows=10261 width=4) (actual
time=0.096..56.759 rows=3868 loops=1)
Merge Cond: ("outer".adr_id = "inner".adr_id)
Filter: ("inner".adr_id IS NULL)
-> Index Scan using adr_pkey on adr a (cost=0.00..947.54
rows=10261 width=4) (actual time=0.012..23.118 rows=10261 loops=1)
-> Index Scan using kontakt_adr_id_idx on kontakt k
(cost=0.00..295.47 rows=7011 width=4) (actual time=0.007..13.299
rows=7011 loops=1)
Total runtime: 58.510 ms

Regards
Erwin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Albe Laurenz 2007-06-15 07:51:21 Re: DeadLocks...
Previous Message A. Kretschmer 2007-06-15 05:03:36 Re: Historical Data Question