Re: [GENERAL] Recursive optimization of IN subqueries

From: Dennis Haney <davh(at)diku(dot)dk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Recursive optimization of IN subqueries
Date: 2004-01-23 18:36:59
Message-ID: 401169CB.5030500@diku.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:

>Dennis Haney <davh(at)diku(dot)dk> writes:
>
>
>>I saw it as though convert_IN_to_join rewrote the query from
>>
>>
>>select a.* from tenk1 a where a.unique1 in
>>(select c.thousand from tenk1 c where c.hundred = 99);
>>
>>
>>to
>>
>>
>>select a.* from tenk1 a, tenk1 c where a.unique1 = c.thousand AND
>>c.hundred = 99;
>>
>>
>>but after looking at it, I've reached the conclusion that the rewrite is
>>to this instead:
>>
>>
>>select a.* from tenk1 a, (select d.thousand from tenk1 d where
>>d.hundred = 99) as c where a.unique1 = c.thousand;
>>
>>
>
>Right. We do that, and then subsequently pull_up_subqueries transforms
>it to the other representation. The reason for this two-step approach
>is that the intermediate form is still a useful improvement if the
>subquery can't be pulled up for some reason (e.g., it's got grouping).
>
>
With improvement I can see that it can be materialized and thus used as
a normal table in the planner. Is there any additional reasons that I
can't see?
But this limited optimization makes me wonder, why the limitation to
optimizing '='?
And why must the lefthand of the sublink be a variable of the upper query?

>>except the subselect is added as a range table entry instead of a
>>subselect in the from-list (not that I understand this particular part,
>>do you mind explaining?).
>>
>>
>
>Same thing. Every entry in the from-list will have both an RTE and an
>entry in the join tree. This representation is partly historical
>(before we had outer joins, there was only the range table and no join
>tree at all), but it is convenient for many purposes.
>
>

Then I don't understand why it gives two different execution plans? And
the Query* is totally different for the two, eg. there is no RTE for the
subquery in the first query:

davh=# explain select a.* from test1 a, (select num from test1 where id = 2) as b where a.num = b.num;
QUERY PLAN
------------------------------------------------------------------------------------
Hash Join (cost=4.83..29.94 rows=11 width=8)
Hash Cond: ("outer".num = "inner".num)
-> Seq Scan on test1 a (cost=0.00..20.00 rows=1000 width=8)
-> Hash (cost=4.82..4.82 rows=2 width=4)
-> Index Scan using test1_pkey on test1 (cost=0.00..4.82 rows=2 width=4)
Index Cond: (id = 2)
(6 rows)

davh=# explain select a.* from test1 a where a.num in (select num from test1 where id = 2);
QUERY PLAN
------------------------------------------------------------------------------------
Hash IN Join (cost=4.83..28.75 rows=6 width=8)
Hash Cond: ("outer".num = "inner".num)
-> Seq Scan on test1 a (cost=0.00..20.00 rows=1000 width=8)
-> Hash (cost=4.82..4.82 rows=2 width=4)
-> Index Scan using test1_pkey on test1 (cost=0.00..4.82 rows=2 width=4)
Index Cond: (id = 2)
(6 rows)

>PS: this is a bit off-topic for pgsql-general, please pursue it on
>-hackers if you have more questions.

ok

--
Dennis

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Reece Hart 2004-01-23 18:48:27 Re: psql: relocation error: psql: undefined
Previous Message Lee Harr 2004-01-23 18:29:28 Re: sequence in schema -- broken default

Browse pgsql-hackers by date

  From Date Subject
Next Message Neil Conway 2004-01-23 20:20:31 Re: 7.5 change documentation
Previous Message Marc G. Fournier 2004-01-23 18:04:13 Re: [HACKERS] PostgreSQL installation CD based on