Speed up Hash Join by teaching ExprState about hashing

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Speed up Hash Join by teaching ExprState about hashing
Date: 2024-05-13 09:23:49
Message-ID: CAApHDvoexAxgQFNQD_GRkr2O_eJUD1-wUGm=m0L+Gc=T=kEa4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In master, if you look at ExecHashGetHashValue() in nodeHash.c, you
can see that it calls ExecEvalExpr() and then manually calls the hash
function on the returned value. This process is repeated once for each
hash key. This is inefficient for a few reasons:

1) ExecEvalExpr() will only deform tuples up the max varattno that's
mentioned in the hash key. That means we might have to deform
attributes in multiple steps, once for each hash key.
2) ExecHashGetHashValue() is very branchy and checks if hashStrict[]
and keep_nulls on every loop. There's also a branch to check which
hash functions to use.
3) foreach isn't exactly the pinnacle of efficiency either.

All of the above points can be improved by making ExprState handle
hashing. This means we'll deform all attributes that are needed for
hashing once, rather than incrementally once per key. This also allows
JIT compilation of hashing ExprStates, which will make things even
faster.

The attached patch implements this. Here are some performance numbers.

## Test 1: rows=1000 jit=0

1 hash key
master = 4938.5 tps
patched = 5126.7 tps (+3.81%)

2 hash keys
master = 4326.4 tps
patched = 4520.2 tps (+4.48%)

3 hash keys
master = 4145.5 tps
patched = 4559.7 tps (+9.99%)

## Test 2: rows = 1000000 jit=1 (with opt and inline)

1 hash key
master = 3.663 tps
patched = 3.816 tps (+4.16%)

2 hash keys
master = 3.392 tps
patched = 3.550 tps (+4.67%)

3 hash keys
master = 3.086 tps
patched = 3.411 tps (+10.55%)

Benchmark script attached

Notes:
The ExecBuildHash32Expr() function to build the ExprState isn't called
from the same location as the previous ExecInitExprList() code. The
reason for this is that it's not possible to build the ExprState for
hashing in ExecInitHash() because we don't yet know the jointype and
we need to know that because the expression ExecBuildHash32Expr()
needs to allow NULLs for outer join types. I've put the
ExecBuildHash32Expr() call in ExecInitHashJoin() just after we set
hj_NullOuterTupleSlot and hj_NullOuterTupleSlot fields. I tried
having this code in ExecHashTableCreate(). but that's no good as we
only call that during executor run, which is too late as any SubPlans
in the hash keys need to be attributed to the correct parent. Since
EXPLAIN shows the subplans, this needs to be done before executor run.

I've not hacked on llvmjit_expr.c much before, so I'd be happy for a
detailed review of that code.

I manually checked hashvalues between JIT and non-JIT. They matched.
If we ever consider JITting more granularly, it might be worth always
applying the same jit flags to the hash exprs on either side of the
join. I've slight concerns about compiler bugs producing different
hash codes. Unsure if there are non-bug reasons for them to differ on
the same CPU architecture.

I've not looked at applications of this beyond hash join. I'm
considering other executor nodes to be follow-on material.

Thanks to Andres Freund for mentioning this idea to me.

Attachment Content-Type Size
bench.sh.txt text/plain 1.2 KB
v1-0001-Speed-up-Hash-Join-by-making-ExprStates-hash.patch application/octet-stream 35.1 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2024-05-13 09:24:27 Re: Allowing additional commas between columns, and at the end of the SELECT clause
Previous Message Daniel Gustafsson 2024-05-13 09:22:08 Re: explain format json, unit for serialize and memory are different.