Re: Why enable_hashjoin Completely disables HashJoin

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Quan Zongliang <quanzongliang(at)yeah(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why enable_hashjoin Completely disables HashJoin
Date: 2023-04-03 11:44:03
Message-ID: 01c30483-c4c4-2c19-e571-aa10a65c47c5@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/3/23 12:23, Quan Zongliang wrote:
> Hi,
>
> I found that the enable_hashjoin disables HashJoin completely.
> It's in the function add_paths_to_joinrel:
>
> if (enable_hashjoin || jointype == JOIN_FULL)
>     hash_inner_and_outer(root, joinrel, outerrel, innerrel,
>                 jointype, &extra);
>
> Instead, it should add a disable cost to the cost calculation of
> hashjoin. And now final_cost_hashjoin does the same thing:
>
> if (!enable_hashjoin)
>     startup_cost += disable_cost;
>
>
> enable_mergejoin has the same problem.
>
> Test case:
>
> CREATE TABLE t_score_01(
> s_id int,
> s_score int,
> s_course char(8),
> c_id int);
>
> CREATE TABLE t_student_01(
> s_id int,
> s_name char(8));
>
> insert into t_score_01 values(
> generate_series(1, 1000000), random()*100, 'course', generate_series(1,
> 1000000));
>
> insert into t_student_01 values(generate_series(1, 1000000), 'name');
>
> analyze t_score_01;
> analyze t_student_01;
>
> SET enable_hashjoin TO off;
> SET enable_nestloop TO off;
> SET enable_mergejoin TO off;
>
> explain select count(*)
> from t_student_01 a join t_score_01 b on a.s_id=b.s_id;
>
> After disabling all three, the HashJoin path should still be chosen.
>

It's not clear to me why that behavior would be desirable? Why is this
an issue you need so solve?

AFAIK the reason why some paths are actually disabled (not built at all)
while others are only penalized by adding disable_cost is that we need
to end up with at least one way to execute the query. So we pick a path
that we know is possible (e.g. seqscan) and hard-disable other paths.
But the always-possible path is only soft-disabled by disable_cost.

For joins, we do the same thing. The hash/merge joins may not be
possible, because the data types may not have hash/sort operators, etc.
Nestloop is always possible. So we soft-disable nestloop but
hard-disable hash/merge joins.

I doubt we want to change this behavior, unless there's a good reason to
do that ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-04-03 11:54:29 Re: running logical replication as the subscription owner
Previous Message Onur Tirtir 2023-04-03 11:09:50 RE: [EXTERNAL] Re: [PATCH] Report the query string that caused a memory error under Valgrind