How to change order sort of table in HashJoin

From: Man Trieu <man(dot)trieu(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject: How to change order sort of table in HashJoin
Date: 2016-11-19 05:46:47
Message-ID: CAPaRaYpADJ74HZD0X+8a5iDRJsgOeOHSX5ZSQiJk+x2SKhBukw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi Experts,

As in the example below, i think the plan which hash table is created on
testtbl2 (the fewer tuples) should be choosen.
Because creating of hash table should faster in testtbl2. But it did not.

I have tried to change the ordering of table by tuning parameter even if
using pg_hint_plan but not success.

Why does planner do not choose the plan which hash table is created on
testtbl2 (which can take less time)?
And how to change the order?

# I also confirm planner info by rebuild postgresql but not found related
usefull info about hash table

---
postgres=# create table testtbl1(id integer, c1 text, c2 text, c3 text,
primary key (c1,c2,c3));
CREATE TABLE
postgres=# create table testtbl2(id integer, c1 text, c2 text, c3 text,
primary key (c1,c2,c3));
CREATE TABLE
postgres=# insert into testtbl1 select
generate_series(1,1000000),random()::text,random()::text,random()::text;
INSERT 0 1000000
postgres=# insert into testtbl2 select * from testtbl1 where id%7 = 0;
INSERT 0 142857

postgres=# explain analyze select * from testtbl1 inner join testtbl2
using(c1,c2,c3);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=38775.00..47171.72 rows=1 width=59) (actual
time=1120.824..1506.236 rows=142857 loops=1)
Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2)
AND (testtbl2.c3 = testtbl1.c3))
-> Seq Scan on testtbl2 (cost=0.00..3039.57 rows=142857 width=56)
(actual time=0.008..27.964 rows=142857 loops=1)
-> Hash (cost=21275.00..21275.00 rows=1000000 width=55) (actual
time=1120.687..1120.687 rows=1000000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 89713kB
-> Seq Scan on testtbl1 (cost=0.00..21275.00 rows=1000000
width=55) (actual time=0.035..458.522 rows=1000000 loops=1)
Planning time: 0.922 ms
Execution time: 1521.258 ms
(8 rows)

postgres=# set pg_hint_plan.enable_hint to on;
SET
postgres=# /*+
postgres*# HashJoin(testtbl1 testtbl2)
postgres*# Leading(testtbl1 testtbl2)
postgres*# */
postgres-# explain analyze select * from testtbl1 inner join testtbl2
using(c1,c2,c3);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=48541.00..67352.86 rows=1 width=59) (actual
time=1220.625..1799.709 rows=142857 loops=1)
Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2)
AND (testtbl2.c3 = testtbl1.c3))
-> Seq Scan on testtbl2 (cost=0.00..3039.57 rows=142857 width=56)
(actual time=0.011..58.649 rows=142857 loops=1)
-> Hash (cost=21275.00..21275.00 rows=1000000 width=55) (actual
time=1219.295..1219.295 rows=1000000 loops=1)
Buckets: 8192 Batches: 32 Memory Usage: 2851kB
-> Seq Scan on testtbl1 (cost=0.00..21275.00 rows=1000000
width=55) (actual time=0.021..397.583 rows=1000000 loops=1)
Planning time: 3.971 ms
Execution time: 1807.710 ms
(8 rows)

postgres=#
---

Thanks and best regard!

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Fran ... 2016-11-19 13:21:56 Database migration to RDS issues permissions
Previous Message David G. Johnston 2016-11-18 18:42:03 Re: Avoiding double-counting in aggregates with more than one join?

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2016-11-19 10:47:24 Re: Mail thread references in commits
Previous Message Amit Kapila 2016-11-19 05:26:45 Re: Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly