While investigating some performance problems recently I've had cause
to think about the way PostgreSQL uses hash joins. So here are a few
thoughts. Some of these have been brought up before.
1. When the hash is not expected to spill to disk, it preserves the
pathkeys of the outer side of the join. If the optimizer were allowed
to assume that, it could produce significantly more efficient query
plans in some cases. The problem is what to do if we start executing
the query and find out that we have more stuff to hash than we expect,
such that we need multiple batches? Now the results won't be sorted.
I think we could handle this as follows: Don't count on the hash join
to preserve pathkeys unless it helps, and only rely on it when it
seems as if the hash table will still fit even if it turns out to be,
say, three times as big as expected. But if you are counting on the
hash join to preserve pathkeys, then pass that information to the
executor. When the executor is asked to perform a hash join, it will
first hash the inner side of the relation. At that point, we know
whether we've succesfully gotten everything into a single batch, or
not. If we have, perform the join normally. If the worst has
happened and we've gone multi-batch, then perform the join and sort
the output before returning it. The performance will suck, but at
least you'll get the right answer.
Previous in-passing reference to this idea here:
2. Consider building the hash table lazily. I often see query planner
pick a hash join over a nested inner indexscan because it thinks that
it'll save enough time making hash probes rather than index probes to
justify the time spent building the hash table up front. But
sometimes the relation that's being hashed has a couple thousand rows,
only a tiny fraction of which will ever be retrieved from the hash
table. We can predict when this is going to happen because n_distinct
for the outer column will be much less than the size of the inner rel.
In that case, we could consider starting with an empty hash table
that effectively acts as a cache. Each time a value is probed, we
look it up in the hash table. If there's no entry, we use an index
scan to find the matching rows and insert them into the hash table.
Negative results must also be cached.
3. Avoid building the exact same hash table twice in the same query.
This happens more often you'd think. For example, a table may have
two columns creator_id and last_updater_id which both reference person
(id). If you're considering a hash join between paths A and B, you
could conceivably check whether what is essentially a duplicate of B
has already been hashed somewhere within path A. If so, you can reuse
that same hash table at zero startup-cost.
4. As previously discussed, avoid hashing for distinct and then
hashing the results for a hash join on the same column with the same
Thoughts on the value and/or complexity of implementation of any of these?
pgsql-hackers by date
|Next:||From: Robert Haas||Date: 2009-04-03 02:26:46|
|Subject: Re: benchmarking the query planner|
|Previous:||From: Andrew Dunstan||Date: 2009-04-03 01:34:16|
|Subject: Re: [SQL] How would I get rid of trailing blank line?|