Re: Ambigous Plan - Larger Table on Hash Side

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Narendra Pradeep U U <narendra(dot)pradeep(at)zohocorp(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Ambigous Plan - Larger Table on Hash Side
Date: 2018-03-14 04:48:16
Message-ID: CAFjFpRfjJy2LS-uBhPR+u58HcqWKXUDkgt+n_=JqkantFe_82Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 13, 2018 at 4:32 PM, Narendra Pradeep U U
<narendra(dot)pradeep(at)zohocorp(dot)com> wrote:
> Hi,
> Thanks everyone for your suggestions. I would like to add explain
> analyze of both the plans so that we can have broader picture.
>
> I have a work_mem of 1000 MB.
>
> The Plan which we get regularly with table being analyzed .
>
> tpch=# explain analyze select b from tab2 left join tab1 on a = b;
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------
> Hash Left Join (cost=945515.68..1071064.34 rows=78264 width=4) (actual
> time=9439.410..20445.620 rows=78264 loops=1)
> Hash Cond: (tab2.b = tab1.a)
> -> Seq Scan on tab2 (cost=0.00..1129.64 rows=78264 width=4) (actual
> time=0.006..5.116 rows=78264 loops=1)
> -> Hash (cost=442374.30..442374.30 rows=30667630 width=4) (actual
> time=9133.593..9133.593 rows=30667722 loops=1)
> Buckets: 33554432 Batches: 2 Memory Usage: 801126kB
> -> Seq Scan on tab1 (cost=0.00..442374.30 rows=30667630 width=4)
> (actual time=0.030..3584.652 rows=30667722 loops=1)
> Planning time: 0.055 ms
> Execution time: 20472.603 ms
> (8 rows)
>
>
>
> I reproduced the other plan by not analyzing the smaller table.
>
> tpch=# explain analyze select b from tab2 left join tab1 on a = b;
> QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------
> Hash Right Join (cost=2102.88..905274.97 rows=78039 width=4) (actual
> time=15.331..7590.406 rows=78264 loops=1)
> Hash Cond: (tab1.a = tab2.b)
> -> Seq Scan on tab1 (cost=0.00..442375.48 rows=30667748 width=4)
> (actual time=0.046..2697.480 rows=30667722 loops=1)
> -> Hash (cost=1127.39..1127.39 rows=78039 width=4) (actual
> time=15.133..15.133 rows=78264 loops=1)
> Buckets: 131072 Batches: 1 Memory Usage: 3776kB
> -> Seq Scan on tab2 (cost=0.00..1127.39 rows=78039 width=4)
> (actual time=0.009..5.516 rows=78264 loops=1)
> Planning time: 0.053 ms
> Execution time: 7592.688 ms
> (8 rows)

I am surprised to see the estimates to be very close to the actual
values even without analysing the small table.

>
>
> The actual plan seems to be Slower. The smaller table (tab2) has exactly
> each row duplicated 8 times and all the rows in larger table (tab2) are
> distinct. what may be the exact reason and can we fix this ?

After analysing the small table, the first plan is chosen as the
cheapest. This means that the plan with smaller table being hashed has
cost higher than the plan with larger table being hashed. We need to
examine that costing to see what went wrong in costing.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuro Yamada 2018-03-14 04:55:18 Re: planner bug regarding lateral and subquery?
Previous Message Masahiko Sawada 2018-03-14 04:40:03 Re: [HACKERS] GUC for cleanup indexes threshold.