Quick Links

Re: Ambigous Plan - Larger Table on Hash Side

From:	Narendra Pradeep U U <narendra(dot)pradeep(at)zohocorp(dot)com>
To:	"Ashutosh Bapat" <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc:	"pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Ambigous Plan - Larger Table on Hash Side
Date:	2018-03-13 11:02:36
Message-ID:	1621f06b77d.123cd6384694.5336403266862524540@zohocorp.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Thanks everyone for your suggestions. I would like to add explain analyze of both the plans so that we can have broader picture.

I have a work_mem of 1000 MB.

The Plan which we get regularly with table being analyzed .

tpch=# explain analyze select b from tab2 left join tab1 on a = b;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------

Hash Left Join (cost=945515.68..1071064.34 rows=78264 width=4) (actual time=9439.410..20445.620 rows=78264 loops=1)

Hash Cond: (tab2.b = tab1.a)

-> Seq Scan on tab2 (cost=0.00..1129.64 rows=78264 width=4) (actual time=0.006..5.116 rows=78264 loops=1)

-> Hash (cost=442374.30..442374.30 rows=30667630 width=4) (actual time=9133.593..9133.593 rows=30667722 loops=1)

Buckets: 33554432 Batches: 2 Memory Usage: 801126kB

-> Seq Scan on tab1 (cost=0.00..442374.30 rows=30667630 width=4) (actual time=0.030..3584.652 rows=30667722 loops=1)

Planning time: 0.055 ms

Execution time: 20472.603 ms

(8 rows)

I reproduced the other plan by not analyzing the smaller table.

tpch=# explain analyze select b from tab2 left join tab1 on a = b;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------

Hash Right Join (cost=2102.88..905274.97 rows=78039 width=4) (actual time=15.331..7590.406 rows=78264 loops=1)

Hash Cond: (tab1.a = tab2.b)

-> Seq Scan on tab1 (cost=0.00..442375.48 rows=30667748 width=4) (actual time=0.046..2697.480 rows=30667722 loops=1)

-> Hash (cost=1127.39..1127.39 rows=78039 width=4) (actual time=15.133..15.133 rows=78264 loops=1)

Buckets: 131072 Batches: 1 Memory Usage: 3776kB

-> Seq Scan on tab2 (cost=0.00..1127.39 rows=78039 width=4) (actual time=0.009..5.516 rows=78264 loops=1)

Planning time: 0.053 ms

Execution time: 7592.688 ms

(8 rows)

The actual plan seems to be Slower. The smaller table (tab2) has exactly each row duplicated 8 times and all the rows in larger table (tab2) are distinct. what may be the exact reason and can we fix this ?

P.s I have also attached a sql file to reproduce this

---- On Tue, 13 Mar 2018 12:42:12 +0530 Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote ----

On Mon, Mar 12, 2018 at 10:02 PM, Narendra Pradeep U U

<narendra(dot)pradeep(at)zohocorp(dot)com> wrote:

> Hi ,

> Recently I came across a case where the planner choose larger table on

> hash side. I am not sure whether it is an intended behavior or we are

> missing something.

> I have two tables (a and b) each with single column in it. One table

> 'a' is large with around 30 million distinct rows and other table 'b' has

> merely 70,000 rows with one-seventh (10,000) distinct rows. I have analyzed

> both the table. But while joining both the table I get the larger table on

> hash side.

> tpch=# explain select b from b left join a on a = b;

> QUERY PLAN

> ---------------------------------------------------------------------------------------------------------

> Hash Left Join (cost=824863.75..950104.42 rows=78264 width=4)

> Hash Cond: (b.b = a.a)o

> -> Foreign Scan on b (cost=0.00..821.64 rows=78264 width=4)

> CStore File:

> /home/likewise-open/pg96/data/cstore_fdw/1818708/1849879

> CStore File Size: 314587

> -> Hash (cost=321721.22..321721.22 rows=30667722 width=4)

> -> Foreign Scan on a (cost=0.00..321721.22 rows=30667722 width=4)

> CStore File:

> /home/likewise-open/pg96/data/cstore_fdw/1818708/1849876

> CStore File Size: 123236206

> (9 rows)

> I would like to know the reason for choosing this plan and Is there a easy

> fix to prevent such plans (especially like this one where it choose a larger

> hash table) ?

A plan with larger table being hashed doesn't necessarily bad

performing one. During partition-wise join analysis I have seen plans

with larger table being hashed perform better than the plans with

smaller table being hashed. But I have seen the other way around as

well. Although, I don't know an easy way to force which side of join

gets hashed. I tried that under the debugger. In your case, if you run

EXPLAIN ANALYZE on this query, produce outputs of two plans: one with

larger table being hashed and second with the smaller one being

hashed, you will see which of them performs better.

Best Wishes,

Ashutosh Bapat

EnterpriseDB Corporation

The Postgres Database Company

Attachment	Content-Type	Size
join_plan.sql	application/octet-stream	371 bytes

In response to

Re: Ambigous Plan - Larger Table on Hash Side at 2018-03-13 07:12:12 from Ashutosh Bapat

Responses

Re: Ambigous Plan - Larger Table on Hash Side at 2018-03-14 04:48:16 from Ashutosh Bapat
Re: Ambigous Plan - Larger Table on Hash Side at 2018-03-14 19:10:27 from Jeff Janes

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Simon Riggs	2018-03-13 11:04:03	Re: SQL/JSON: functions
Previous Message	Ashutosh Bapat	2018-03-13 10:56:46	Re: Additional Statistics Hooks