problems with a sub-select (takes donkeys' years)

From: Stuart Rison <stuart(at)ludwig(dot)ucl(dot)ac(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: problems with a sub-select (takes donkeys' years)
Date: 1999-03-29 16:32:57
Message-ID: v04003a05b325562e3c99@[128.40.242.190]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

I've got two SELECT statements which work fine (take about 5secs each) when
I do them sequentially but take ages (around 5mins) if in a statement with
a sub-SELECT.

The individual SELECTs are:

SELECT p1.brecard_id FROM
malignant_pathologies p1,
malignant_pathologies p2,
malignant_pathologies p3
WHERE p1.code='MAPH'
AND p1.brecard_id=p2.brecard_id
AND p2.code='AMCA'
AND p3.brecard_id=p1.brecard_id
AND p3.code='LOCA';

brecard_id
----------------
DSTL12031999016
DPHA12031999017
DCCH12031999056
DUCH12031999059
DUCH12031999063
DCCH12031999077
DUCH12031999098
(7 rows)

SELECT brecard_id,count(brecard_id) FROM
malignant_pathologies
WHERE brecard_id IN
('DSTL12031999016','DPHA12031999017','DCCH12031999056','DUCH12031999059','DUCH12
031999063','DCCH12031999077','DUCH12031999098')
GROUP BY brecard_id;

brecard_id |count
----------------+-----
DCCH12031999056 | 10
DCCH12031999077 | 12
DPHA12031999017 | 11
DSTL12031999016 | 11
DUCH12031999059 | 13
DUCH12031999063 | 6
DUCH12031999098 | 14
(7 rows)

(This is just an example query, what I'm trying to do here is use the list
of brecard_id's generated by the previous query).

Both of these execute in about 5 seconds (on 1000 rows)

When I put them together as:

SELECT brecard_id,count(brecard_id) FROM
malignant_pathologies
WHERE brecard_id IN (
SELECT p1.brecard_id FROM
malignant_pathologies p1,
malignant_pathologies p2,
malignant_pathologies p3
WHERE p1.code='MAPH'
AND p1.brecard_id=p2.brecard_id
AND p2.code='AMCA'
AND p3.brecard_id=p1.brecard_id
AND p3.code='LOCA');

It takes around 5mins for the query to complete! The EXPLAIN for the quey is:

NOTICE: QUERY PLAN:

Aggregate (cost=4.27 size=0 width=0)
-> Group (cost=4.27 size=0 width=0)
-> Sort (cost=4.27 size=0 width=0)
-> Seq Scan on malignant_pathologies (cost=4.27 size=99
width=12)
SubPlan
-> Nested Loop (cost=8.27 size=1 width=36)
-> Nested Loop (cost=6.27 size=1 width=24)
-> Seq Scan on malignant_pathologies p3
(cost=4.27 size=1 width=12)
-> Index Scan using
malignant_pathologies_pkey on malignant_pathologies p1 (cost=2.00 size=1
width=12)
-> Index Scan using malignant_pathologies_pkey
on malignant_pathologies p2 (cost=2.00 size=1 width=12)

EXPLAIN

[Which is not different from putting together the EXPLAINs from each of the
individual queries]

Can anyone explain why the sub-query form takes so long?

Regards,

Stuart.

+-------------------------+--------------------------------------+
| Stuart Rison | Ludwig Institute for Cancer Research |
+-------------------------+ 91 Riding House Street |
| Tel. (0171) 878 4041 | London, W1P 8BT, UNITED KINGDOM. |
| Fax. (0171) 878 4040 | stuart(at)ludwig(dot)ucl(dot)ac(dot)uk |
+-------------------------+--------------------------------------+

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gerard Saraber 1999-03-29 16:44:09 backing up pgsql
Previous Message Herouth Maoz 1999-03-29 16:00:00 Re: [GENERAL] Returning an integer from a date