Re: Ambigous Plan - Larger Table on Hash Side

From: Narendra Pradeep U U <narendra(dot)pradeep(at)zohocorp(dot)com>
To: "Narendra Pradeep U U" <narendra(dot)pradeep(at)zohocorp(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Cc: "Jeff Janes" <jeff(dot)janes(at)gmail(dot)com>
Subject: Re: Ambigous Plan - Larger Table on Hash Side
Date: 2018-03-16 06:42:11
Message-ID: 1622d8b6134.b21d68ba20339.8548376094261380043@zohocorp.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Jeff,

I repeated the same query with a work_mem of 2000MB. It is faster than the one with two batches but still slower than hashing the smaller table. So in this case It makes more sense to hash the smaller table (less execution time and reduce hash table size).

Explain analyze with higher work_mem (2 GB)

tpch=# explain analyze select b from tab2 left join tab1 on a = b;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------

Hash Left Join (cost=825722.33..830863.00 rows=78264 width=4) (actual time=7876.746..7905.323 rows=78264 loops=1)

Hash Cond: (tab2.b = tab1.a)

-&gt; Seq Scan on tab2 (cost=0.00..1129.64 rows=78264 width=4) (actual time=0.007..4.268 rows=78264 loops=1)

-&gt; Hash (cost=442375.48..442375.48 rows=30667748 width=4) (actual time=7834.214..7834.214 rows=30667722 loops=1)

Buckets: 33554432 Batches: 1 Memory Usage: 1340307kB

-&gt; Seq Scan on tab1 (cost=0.00..442375.48 rows=30667748 width=4) (actual time=0.047..2267.995 rows=30667722 loops=1)

Planning time: 0.052 ms

Execution time: 7953.913 ms

(8 rows)

Yeah, explain analyze distorted the time, while the actual query is twice faster.

Still, 2GB work_mem is not a feasible idea. Is there a way to avoid these plan (especially avoiding larger table on hash side) or any work around available ?

Thanks,

Pradeep

---- On Thu, 15 Mar 2018 00:40:27 +0530 Jeff Janes &lt;jeff(dot)janes(at)gmail(dot)com&gt; wrote ----

On Tue, Mar 13, 2018 at 4:02 AM, Narendra Pradeep U U &lt;narendra(dot)pradeep(at)zohocorp(dot)com&gt; wrote:

Hi,

Thanks everyone for your suggestions. I would like to add explain analyze of both the plans so that we can have broader picture.

I have a work_mem of 1000 MB.

Is it possible to repeat with 2000MB or 3000MB? It would be interesting to see what the estimated cost and what the actual time would be if there were only 1 batch rather than 2.

Also, can you repeat all of these with EXPLAIN (ANALYZE, TIMING OFF) ? Sometimes the act of measuring the times can distort the times by quite a bit. (It will still give an overall execution time, it just won't try to attribute that time to the individual steps)

Cheers,

Jeff

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-03-16 07:07:46 Re: segmentation fault in pg head with SQL function.
Previous Message Prabhat Sahu 2018-03-16 06:05:13 segmentation fault in pg head with SQL function.