Re: One question about transformation ANY Sublinks into joins

From: Armor <yupengstone(at)qq(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: One question about transformation ANY Sublinks into joins
Date: 2016-07-23 14:41:51
Message-ID: tencent_3A145B9449BF65EC58AE4592@qq.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

After we pull up this sublink as semi join , when make join rel for semi join, the optimizer will take hash join method into account if a unique path can be created with the RHS, for detail please check make_join_rel in src/backend/optimizer/path/joinrels.c.
For this case, the cost of hash join is cheaper than semi join, so you can see the planner chose the hash join rather than semi join.

------------------
Jerry Yu
https://github.com/scarbrofair

------------------ Original ------------------
From: "Robert Haas";<robertmhaas(at)gmail(dot)com>;
Date: Fri, Jul 22, 2016 00:23 AM
To: "Armor"<yupengstone(at)qq(dot)com>;
Cc: "pgsql-hackers"<pgsql-hackers(at)postgresql(dot)org>;
Subject: Re: [HACKERS] One question about transformation ANY Sublinks into joins

On Sun, Jul 17, 2016 at 5:33 AM, Armor <yupengstone(at)qq(dot)com> wrote:
> Hi
> I run a simple SQL with latest PG:
> postgres=# explain select * from t1 where id1 in (select id2 from t2 where
> c1=c2);
> QUERY PLAN
> ------------------------------------------------------------
> Seq Scan on t1 (cost=0.00..43291.83 rows=1130 width=8)
> Filter: (SubPlan 1)
> SubPlan 1
> -> Seq Scan on t2 (cost=0.00..38.25 rows=11 width=4)
> Filter: (t1.c1 = c2)
> (5 rows)
>
> and the table schema are as following:
>
> postgres=# \d t1
> Table "public.t1"
> Column | Type | Modifiers
> --------+---------+-----------
> id1 | integer |
> c1 | integer |
>
> postgres=# \d t2
> Table "public.t2"
> Column | Type | Modifiers
> --------+---------+-----------
> id2 | integer |
> c2 | integer |
>
> I find PG decide not to pull up this sublink because the whereClauses
> in this sublink refer to the Vars of parent query, for detail please check
> the function named convert_ANY_sublink_to_join in
> src/backend/optimizer/plan/subselect.c.
> However, for such simple sublink which has no agg, no window function,
> no limit, may be we can carefully pull up the predicates in whereCluase
> which refers to the Vars of parent query, then pull up this sublink and
> produce a query plan as following:
>
> postgres=# explain select * from t1 where id1 in (select id2 from t2 where
> c1=c2);
> QUERY PLAN
> ------------------------------------------------------------------------
> Hash Join (cost=49.55..99.23 rows=565 width=8)
> Hash Cond: ((t1.id1 = t2.id2) AND (t1.c1 = t2.c2))
> -> Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8)
> -> Hash (cost=46.16..46.16 rows=226 width=8)
> -> HashAggregate (cost=43.90..46.16 rows=226 width=8)
> Group Key: t2.id2, t2.c2
> -> Seq Scan on t2 (cost=0.00..32.60 rows=2260 width=8)

It would need to be a Hash Semi Join rather than a Hash Join, wouldn't it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jyoti Sharma 2016-07-23 18:20:01 Problem in PostgresSQL Configuration with YII 1 & Wordpress
Previous Message Anton Dignös 2016-07-23 13:37:38 Re: [PROPOSAL] Temporal query processing with range types