UPDATE with subquery too slow

From: "Eric Jain" <Eric(dot)Jain(at)isb-sib(dot)ch>
To: "pgsql-performance" <pgsql-performance(at)postgresql(dot)org>
Subject: UPDATE with subquery too slow
Date: 2004-02-17 12:38:19
Message-ID: 00ed01c3f552$ec054e80$c300000a@caliente
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I can't get the following statement to complete with reasonable time.
I've had it running for over ten hours without getting anywhere. I
suspect (hope) there may be a better way to accomplish what I'm trying
to do (set fields containing unique values to null):

UPDATE requests
SET session = NULL
WHERE session IN
(
SELECT session
FROM requests
GROUP BY session
HAVING COUNT(*) = 1
);

Output of EXPLAIN:

Nested Loop
(cost=170350.16..305352.37 rows=33533 width=98)
-> HashAggregate
(cost=170350.16..170350.16 rows=200 width=8)
-> Subquery Scan "IN_subquery"
(cost=169728.12..170261.30 rows=35545 width=8)
-> HashAggregate
(cost=169728.12..169905.85 rows=35545 width=8)
Filter: (count(*) = 1)
-> Seq Scan on requests
(cost=0.00..139207.75 rows=6104075 width=8)
-> Index Scan using requests_session_idx on requests
(cost=0.00..672.92 rows=168 width=106)
Index Cond: (requests."session" = "outer"."session")

If I drop the index on requests(session):

Hash Join
(cost=170350.66..340414.12 rows=33533 width=98)
Hash Cond: ("outer"."session" = "inner"."session")
-> Seq Scan on requests
(cost=0.00..139207.75 rows=6104075 width=106)
-> Hash
(cost=170350.16..170350.16 rows=200 width=8)
-> HashAggregate
(cost=170350.16..170350.16 rows=200 width=8)
-> Subquery Scan "IN_subquery"
(cost=169728.12..170261.30 rows=35545 width=8)
-> HashAggregate
(cost=169728.12..169905.85 rows=35545 width=8)
Filter: (count(*) = 1)
-> Seq Scan on requests
(cost=0.00..139207.75 rows=6104075
width=8)

The subquery itself requires 5-10 min to run on its own, and may return
several million rows.

Using EXISTS rather than IN (I'm using 7.4-RC2, not sure if IN queries
were already improved in this release):

UPDATE requests
SET session = NULL
WHERE NOT EXISTS
(
SELECT r.session
FROM requests r
WHERE
r.session = session
AND NOT r.id = id
);

With and without index:

Result
(cost=227855.74..415334.22 rows=8075449 width=101)
One-Time Filter: (NOT $0)
InitPlan
-> Seq Scan on requests r
(cost=0.00..227855.74 rows=201 width=8)
Filter: (("session" = "session") AND (id <> id))
-> Seq Scan on requests
(cost=0.00..187478.49 rows=8075449 width=101)

I've been running this for more than an hour so far, and no end in
sight, either... Any ideas?

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Leeuw van der, Tim 2004-02-17 12:49:29 Re: UPDATE with subquery too slow
Previous Message Shridhar Daithankar 2004-02-17 11:00:52 Re: Slow response of PostgreSQL