Costing bug in hash join logic for semi joins

From: RK Korlapati <rkkorlapati(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Costing bug in hash join logic for semi joins
Date: 2018-07-11 17:09:04
Message-ID: CAHupG3gMjnOFpnL=zPaMhOibJ5J4qJ5yH6GjY0dJV5uQAdJcRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

There is a costing bug in hash join logic seems to have been introduced by
the patch related to inner_unique enhancements(commit:
9c7f5229ad68d7e0e4dd149e3f80257893e404d4). Specifically, "hashjointuples"
which tracks the number of matches for hash clauses is computed wrong for
inner unique scenario. This leads to lot of semi-joins incorrectly turn to
inner joins with unique on inner side. Function "final_cost_hashjoin" has
special handling to cost semi/anti joins to account for early stop after
the first match. This is enhanced by the above said commit to be used for
inner_unique scenario also. However, "hashjointuples" computation is not
fixed to handle this new scenario which is incorrectly stepping into the
anti join logic and assuming unmatched rows. Fix is to handle inner_unique
case when computing "hashjointuples". Here is the outline of the code that
shows the bug.

void
final_cost_hashjoin(PlannerInfo *root, HashPath *path,
JoinCostWorkspace *workspace,
JoinPathExtraData *extra)
{
.....
/* CPU costs */

* if (path->jpath.jointype == JOIN_SEMI ||
path->jpath.jointype == JOIN_ANTI || extra->inner_unique)*
{
......

* /* Get # of
tuples that will pass the basic join */ if
(path->jpath.jointype == JOIN_SEMI) hashjointuples =
outer_matched_rows; else hashjointuples =
outer_path_rows - outer_matched_rows; *
}
else
{
.....
}
}

Thanks, RK (Salesforce)

Browse pgsql-hackers by date

  From Date Subject
Next Message Emre Hasegeli 2018-07-11 17:13:15 Re: [PATCH] Improve geometric types
Previous Message Heikki Linnakangas 2018-07-11 17:07:46 Re: cursors with prepared statements