Re: [GENERAL] How to change order sort of table in HashJoin

From: Melvin Davidson <melvin6925(at)gmail(dot)com>
To: Man Trieu <man(dot)trieu(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: [GENERAL] How to change order sort of table in HashJoin
Date: 2016-11-19 14:44:49
Message-ID: CANu8FizPo25iBB0h98LbmwZFV+p1VFznCJ7mkT25Crx9PzLZ7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Sat, Nov 19, 2016 at 12:46 AM, Man Trieu <man(dot)trieu(at)gmail(dot)com> wrote:

> Hi Experts,
>
> As in the example below, i think the plan which hash table is created on
> testtbl2 (the fewer tuples) should be choosen.
> Because creating of hash table should faster in testtbl2. But it did not.
>
> I have tried to change the ordering of table by tuning parameter even if
> using pg_hint_plan but not success.
>
> Why does planner do not choose the plan which hash table is created on
> testtbl2 (which can take less time)?
> And how to change the order?
>
> # I also confirm planner info by rebuild postgresql but not found related
> usefull info about hash table
>
> ---
> postgres=# create table testtbl1(id integer, c1 text, c2 text, c3 text,
> primary key (c1,c2,c3));
> CREATE TABLE
> postgres=# create table testtbl2(id integer, c1 text, c2 text, c3 text,
> primary key (c1,c2,c3));
> CREATE TABLE
> postgres=# insert into testtbl1 select generate_series(1,1000000),
> random()::text,random()::text,random()::text;
> INSERT 0 1000000
> postgres=# insert into testtbl2 select * from testtbl1 where id%7 = 0;
> INSERT 0 142857
>
> postgres=# explain analyze select * from testtbl1 inner join testtbl2
> using(c1,c2,c3);
> QUERY PLAN
> ------------------------------------------------------------
> ---------------------------------------------------------------------
> Hash Join (cost=38775.00..47171.72 rows=1 width=59) (actual
> time=1120.824..1506.236 rows=142857 loops=1)
> Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2)
> AND (testtbl2.c3 = testtbl1.c3))
> -> Seq Scan on testtbl2 (cost=0.00..3039.57 rows=142857 width=56)
> (actual time=0.008..27.964 rows=142857 loops=1)
> -> Hash (cost=21275.00..21275.00 rows=1000000 width=55) (actual
> time=1120.687..1120.687 rows=1000000 loops=1)
> Buckets: 131072 Batches: 1 Memory Usage: 89713kB
> -> Seq Scan on testtbl1 (cost=0.00..21275.00 rows=1000000
> width=55) (actual time=0.035..458.522 rows=1000000 loops=1)
> Planning time: 0.922 ms
> Execution time: 1521.258 ms
> (8 rows)
>
> postgres=# set pg_hint_plan.enable_hint to on;
> SET
> postgres=# /*+
> postgres*# HashJoin(testtbl1 testtbl2)
> postgres*# Leading(testtbl1 testtbl2)
> postgres*# */
> postgres-# explain analyze select * from testtbl1 inner join testtbl2
> using(c1,c2,c3);
> QUERY PLAN
> ------------------------------------------------------------
> ---------------------------------------------------------------------
> Hash Join (cost=48541.00..67352.86 rows=1 width=59) (actual
> time=1220.625..1799.709 rows=142857 loops=1)
> Hash Cond: ((testtbl2.c1 = testtbl1.c1) AND (testtbl2.c2 = testtbl1.c2)
> AND (testtbl2.c3 = testtbl1.c3))
> -> Seq Scan on testtbl2 (cost=0.00..3039.57 rows=142857 width=56)
> (actual time=0.011..58.649 rows=142857 loops=1)
> -> Hash (cost=21275.00..21275.00 rows=1000000 width=55) (actual
> time=1219.295..1219.295 rows=1000000 loops=1)
> Buckets: 8192 Batches: 32 Memory Usage: 2851kB
> -> Seq Scan on testtbl1 (cost=0.00..21275.00 rows=1000000
> width=55) (actual time=0.021..397.583 rows=1000000 loops=1)
> Planning time: 3.971 ms
> Execution time: 1807.710 ms
> (8 rows)
>
> postgres=#
> ---
>
>
> Thanks and best regard!
>

*AFAIK, the only way to change a sort order is to use the ORDER BY clause
in the SELECT.https://www.postgresql.org/docs/9.4/static/sql-select.html
<https://www.postgresql.org/docs/9.4/static/sql-select.html>"8. If the
ORDER BY clause is specified, the returned rows are sorted in the specified
order. If ORDER BY is not given, the rows are returned in whatever order
the system finds fastest to produce."*

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2016-11-19 14:54:25 Re: Partial update on an postgres upsert violates constraint
Previous Message Adrian Klaver 2016-11-19 14:41:57 Re: Database migration to RDS issues permissions

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2016-11-19 14:51:12 Re: Mail thread references in commits
Previous Message Andrew Dunstan 2016-11-19 14:33:08 Re: Mail thread references in commits