Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

From: Mischa Sandberg <mischa(dot)sandberg(at)telus(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL
Date: 2005-05-11 00:14:26
Message-ID: 1115770466.42814e62cb0e1@webmail.telus.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

Quoting Mark Lewis <mark(dot)lewis(at)mir3(dot)com>:

> If the original paper was published in 1984, then it's been more than
> 20 years. Any potential patents would already have expired, no?

Don't know, but the idea is pervasive among different vendors ...
perhaps that's a clue.

And having now read beyond the start of ExecHashJoin(), I can see that
PG does indeed implement Grace hash; and the implementation is nice and
clean.

If there were room for improvement, (and I didn't see it in the source)
it would be the logic to:

- swap inner and outer inputs (batches) when the original inner turned
out to be too large for memory, and the corresponding outer did not. If
you implement that anyway (complicates the loops) then it's no trouble
to just hash the smaller of the two, every time; saves some CPU.

- recursively partition batches where both inner and outer input batch
ends up being too large for memory, too; or where the required number of
batch output buffers alone is too large for working RAM. This is only
for REALLY big inputs.

Note that you don't need a bad hash function to get skewed batch sizes;
you only need a skew distribution of the values being hashed.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Neil Conway 2005-05-11 00:39:56 Re: [GENERAL] SECURITY RELEASES: 7.2.8 - 7.3.10 - 7.4.8 - 8.0.3
Previous Message Greg Stark 2005-05-10 23:56:15 Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL

Browse pgsql-performance by date

  From Date Subject
Next Message Christopher Kings-Lynne 2005-05-11 01:24:10 Re: Partitioning / Clustering
Previous Message Greg Stark 2005-05-10 23:56:15 Re: [PERFORM] "Hash index" vs. "b-tree index" (PostgreSQL