RE: [HACKERS] Bug on complex subselect (was: Bug on complex join)

From: "Jackson, DeJuan" <djackson(at)cpsgroup(dot)com>
To: phd2(at)earthling(dot)net, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Cc: PGSQL SQL <pgsql-sql(at)postgreSQL(dot)org>
Subject: RE: [HACKERS] Bug on complex subselect (was: Bug on complex join)
Date: 1999-03-10 20:18:15
Message-ID: F10BB1FAF801D111829B0060971D839F703F67@cpsmail
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql

> Hello!
>
> I rewrote my 4-tables join to use subselects:
>
> SELECT DISTINCT subsec_id FROM positions
> WHERE pos_id IN
> (SELECT DISTINCT pos_id
> FROM central
> WHERE shop_id IN
> (SELECT shop_id FROM shops
> WHERE distr_id IN
> (SELECT distr_id FROM districts
> WHERE city_id = 2)
> )
> )
> ;
>
> This does not work, either - postgres loops forever, until I cancel
> psql.
>
> I splitted it - I ran
>
> (SELECT DISTINCT pos_id
> FROM central
> WHERE shop_id IN
> (SELECT shop_id FROM shops
> WHERE distr_id IN
> (SELECT distr_id FROM districts
> WHERE city_id = 2)
> )
> )
>
> and stored result in a file. Then I substituted the
> subselect with the
> file:
>
> SELECT DISTINCT subsec_id FROM positions
> WHERE pos_id IN
> (1, 2, 3, 6, 22, 25, 26, 27, 28, 29, 31, 33, 34, 35, 38, 41,
> 42, 44, 45,
> 46, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61, 62, 63, 64)
>
> and got desired result within a second.
>
> This finally solves my problem, but I need to pass a long
> way to find
> that postgres cannot handle such not too complex joins and subselects.

If you think about the query long enough you'll realize that several
things have to be assummed for that query to be efficient.
Looking at you final query first, and assuming that you have an index on
positions(pos_id):
SELECT DISTINCT subsec_id FROM positions
WHERE pos_id IN
(1, 2, 3, 6, 22, 25, 26, 27, 28, 29, 31, 33, 34, 35, 38, 41,
42, 44, 45, 46, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
60, 61, 62, 63, 64)

This turns into 36 OR clauses in the backend; which as you have
expressed is not a problem for PostgreSQL. But as soon as you make that
IN clause a subselect Postgres can't assume that the results will be
static for each row that it's evaluating, therefore you have:
X rows in positions compared to Y rows from the subselect = X*Y
compares
Let's assume that there are only 10 rows in positions, and that each
select of central return 36 rows. We get up to 360 comparisons for that
query, and none of the compares is likely to use the index because they
are OR'ed together.
Now let's throw in the other subselect and assume that each table only
has 10 rows besides for central which obviously has to have more so
we'll assume 40.
10 rows from district
indexed on city_id (*YAY*) (btree index (maybe 2 compares)
-------------------------
2 rows results
OR= 10 rows from shops + 2) * 10
-------------------------
5 rows results
OR= 40 rows from central + 5) * 40
-------------------------
36 rows results
OR= 10 rows from position + 36) * 10)
------------------------- ------------------
5 rows results 18360 comparisons for this query
And you have to remember that only the innermost subselect is likely to
use an index. (My math could be wrong but,) I think you get the idea.

Try your query this way:
SELECT DISTINCT subsec_id
FROM positions p
WHERE EXISTS(SELECT 1
FROM central c, shops s, districts d
WHERE p.pos_id = c.pos_id AND
c.shop_id = s.shop_id AND
s.distr_id = d.distr_id AND
d.city_id = 2);
Make sure you have indexes on pos_id, shop_id, distr_id, and city_id.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 1999-03-10 20:33:50 Re: [HACKERS] Re: map
Previous Message Jan Wieck 1999-03-10 20:11:24 Re: [HACKERS] Developers globe

Browse pgsql-sql by date

  From Date Subject
Next Message Dan Lauterbach 1999-03-10 21:00:18 How match percent sign in SELECT using LIKE?
Previous Message Bruce Momjian 1999-03-10 19:11:39 Re: [SQL] Performance