From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: a few crazy ideas about hash joins |
Date: | 2009-04-03 05:48:44 |
Message-ID: | 49D5A33C.7060708@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Robert Haas wrote:
> While investigating some performance problems recently I've had cause
> to think about the way PostgreSQL uses hash joins. So here are a few
> thoughts. Some of these have been brought up before.
>
> 1. When the hash is not expected to spill to disk, it preserves the
> pathkeys of the outer side of the join. If the optimizer were allowed
> to assume that, it could produce significantly more efficient query
> plans in some cases. The problem is what to do if we start executing
> the query and find out that we have more stuff to hash than we expect,
> such that we need multiple batches? Now the results won't be sorted.
> I think we could handle this as follows: Don't count on the hash join
> to preserve pathkeys unless it helps, and only rely on it when it
> seems as if the hash table will still fit even if it turns out to be,
> say, three times as big as expected. But if you are counting on the
> hash join to preserve pathkeys, then pass that information to the
> executor. When the executor is asked to perform a hash join, it will
> first hash the inner side of the relation. At that point, we know
> whether we've succesfully gotten everything into a single batch, or
> not. If we have, perform the join normally. If the worst has
> happened and we've gone multi-batch, then perform the join and sort
> the output before returning it. The performance will suck, but at
> least you'll get the right answer.
>
> Previous in-passing reference to this idea here:
> http://archives.postgresql.org/pgsql-hackers/2008-09/msg00806.php
Hmm, instead of a sorting the output if the worst happens, a final merge
step as in a merge sort would be enough.
> 2. Consider building the hash table lazily. I often see query planner
> pick a hash join over a nested inner indexscan because it thinks that
> it'll save enough time making hash probes rather than index probes to
> justify the time spent building the hash table up front. But
> sometimes the relation that's being hashed has a couple thousand rows,
> only a tiny fraction of which will ever be retrieved from the hash
> table. We can predict when this is going to happen because n_distinct
> for the outer column will be much less than the size of the inner rel.
> In that case, we could consider starting with an empty hash table
> that effectively acts as a cache. Each time a value is probed, we
> look it up in the hash table. If there's no entry, we use an index
> scan to find the matching rows and insert them into the hash table.
> Negative results must also be cached.
Yeah, that would be quite nice. One problem is that our ndistinct
estimates are not very accurate.
> 3. Avoid building the exact same hash table twice in the same query.
> This happens more often you'd think. For example, a table may have
> two columns creator_id and last_updater_id which both reference person
> (id). If you're considering a hash join between paths A and B, you
> could conceivably check whether what is essentially a duplicate of B
> has already been hashed somewhere within path A. If so, you can reuse
> that same hash table at zero startup-cost.
That seems like a quite simple thing to do. But would it work for a
multi-batch hash table?
> 4. As previously discussed, avoid hashing for distinct and then
> hashing the results for a hash join on the same column with the same
> operators.
This seems essentially an extension of idea 3.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tena Sakai | 2009-04-03 05:52:09 | Re: [HACKERS] How would I get rid of trailing blank line? |
Previous Message | Heikki Linnakangas | 2009-04-03 05:30:14 | Re: Documentation Update: Document pg_start_backup checkpoint behavior |