Re: Why enable_hashjoin Completely disables HashJoin

From: Quan Zongliang <quanzongliang(at)yeah(dot)net>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why enable_hashjoin Completely disables HashJoin
Date: 2023-04-04 07:38:37
Message-ID: 5d9f64f5-5051-8b9a-bafa-2c3309e8865b@yeah.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2023/4/3 19:44, Tomas Vondra wrote:
> On 4/3/23 12:23, Quan Zongliang wrote:
>> Hi,
>>
>> I found that the enable_hashjoin disables HashJoin completely.
>> It's in the function add_paths_to_joinrel:
>>
>> if (enable_hashjoin || jointype == JOIN_FULL)
>>     hash_inner_and_outer(root, joinrel, outerrel, innerrel,
>>                 jointype, &extra);
>>
>> Instead, it should add a disable cost to the cost calculation of
>> hashjoin. And now final_cost_hashjoin does the same thing:
>>
>> if (!enable_hashjoin)
>>     startup_cost += disable_cost;
>>
>>
>> enable_mergejoin has the same problem.
>>
>> Test case:
>>
>> CREATE TABLE t_score_01(
>> s_id int,
>> s_score int,
>> s_course char(8),
>> c_id int);
>>
>> CREATE TABLE t_student_01(
>> s_id int,
>> s_name char(8));
>>
>> insert into t_score_01 values(
>> generate_series(1, 1000000), random()*100, 'course', generate_series(1,
>> 1000000));
>>
>> insert into t_student_01 values(generate_series(1, 1000000), 'name');
>>
>> analyze t_score_01;
>> analyze t_student_01;
>>
>> SET enable_hashjoin TO off;
>> SET enable_nestloop TO off;
>> SET enable_mergejoin TO off;
>>
>> explain select count(*)
>> from t_student_01 a join t_score_01 b on a.s_id=b.s_id;
>>
>> After disabling all three, the HashJoin path should still be chosen.
>>
>
> It's not clear to me why that behavior would be desirable? Why is this
> an issue you need so solve?
>
Because someone noticed that when he set enable_hashjoin,
enable_mergejoin and enable_nestloop to off. The statement seemed to get
stuck (actually because it chose the NestedLoop path, which took a long
long time to run).
If enable_hashjoin and enable_nestloop disable generating these two
paths. Then enable_nestloop should do the same thing, but it doesn't.

> AFAIK the reason why some paths are actually disabled (not built at all)
> while others are only penalized by adding disable_cost is that we need
> to end up with at least one way to execute the query. So we pick a path
> that we know is possible (e.g. seqscan) and hard-disable other paths.
> But the always-possible path is only soft-disabled by disable_cost.
>
> For joins, we do the same thing. The hash/merge joins may not be
> possible, because the data types may not have hash/sort operators, etc.
> Nestloop is always possible. So we soft-disable nestloop but
> hard-disable hash/merge joins.
>
> I doubt we want to change this behavior, unless there's a good reason to
> do that ...
It doesn't have to change. Because selecting NestedLoop doesn't really
get stuck either. It just takes too long to run.

I will change the patch status to Withdrawn.
>
>
> regards
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2023-04-04 07:48:39 Re: Minimal logical decoding on standbys
Previous Message Hayato Kuroda (Fujitsu) 2023-04-04 07:00:01 [PoC] pg_upgrade: allow to upgrade publisher node