From: | Neil Conway <neilc(at)samurai(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-patches <pgsql-patches(at)postgresql(dot)org> |
Subject: | Re: Hash function for numeric (WIP) |
Date: | 2007-05-04 03:03:59 |
Message-ID: | 1178247839.18303.53.camel@goldbach |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-patches |
On Mon, 2007-30-04 at 00:04 -0400, Tom Lane wrote:
> I'm still not very comfortable with that. You're proposing to add a
> pretty obvious failure mechanism --- any numeric-returning function
> that failed to "normalize" its output would now create a subtle,
> hard-to-find bug.
What about teaching hash_numeric() to explicitly ignore leading and
trailing zero digits?
> Perhaps a suitable test would be to compare the number of
> hash collisions in a large set of randomly-chosen-but-distinct
> numeric values.
Okay, I did a little testing. I created a table with ~2 million
numerics, generated with equal probability by one of the two
expressions:
random()::numeric * ((random() * 100) ^ 20) * 100
random()::numeric * -100;
There were 251 duplicate inputs that I didn't bother eliminating; of
course, they should have effected both hash functions identically.
Results:
neilc=# create temp table r1 as select hash_numeric(a), count(*) from x
group by hash_numeric(a);
neilc=# create temp table r2 as select old_hash_numeric(a), count(*)
from x group by old_hash_numeric(a);
neilc=# select count(*) from r1 where count > 1;
count
--------
123342
(1 row)
neilc=# select count(*) from r2 where count > 1;
count
-------
756
(1 row)
old_hash_numeric() is the hash_any()-based hash, hash_numeric() is my
coding of your suggested hash function (see attached patch).
So it seems we need a better hash if we're not going to use hash_any().
The analysis required to write a robust hash function from scratch is
precisely why I'd prefer to use hash_any() if possible.
-Neil
Attachment | Content-Type | Size |
---|---|---|
numeric_hash_func-4.patch | text/x-patch | 8.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-05-04 03:57:24 | Re: Hash function for numeric (WIP) |
Previous Message | Tom Lane | 2007-05-04 02:44:42 | Re: document plperl argument and return value representation |