Quick Links

SQL assistance, please...

From:	Eric Walstad <eric(at)ericwalstad(dot)com>
To:	sfpug(at)postgresql(dot)org
Subject:	SQL assistance, please...
Date:	2006-02-15 03:59:43
Message-ID:	200602141959.43785.eric@ericwalstad.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	sfpug

Hello everyone,

My addresses table includes a hash of the address, which I use to speed
searches. The searches are working well, but occasionally I find that there
are hash collisions like the following:

select distinct addr1, address_hash from addresses where address_hash =
'-1394669317' order by addr1;
addr1 | address_hash
-----------------------+--------------
123 SOME STREET | -1394669317
543 ANOTHER STREET | -1394669317

That is, totally different addresses happen to result in the same hash value.
I'd like to find out how often this is happening so I can see if I need to
change hash algorithms.

My question is: how can I find all such instances like this in my table?

I'd like something like:
where a.addr1 != b.addr1 and a.address_hash = b.address_hash
but I'm not sure how to join in the table twice in such a way that I don't get
a ton of duplicate records. Also, I'm not sure what this pattern is called,
so I don't know what to google for.

Any pointers are greatly apprceiated. Many thanks in advance,

Eric.

additional info...

The table looks like this:
uv2=# \d addresses

The address hash is generated from a concatenation of addr1 - zip4 fields
(normalized: case insensitive, no punctuation, common replacements (street ->
st), etc.). Ideally, I'd like to find cases where any of the address fields
differ but the address_hash is the same.

Responses

Re: SQL assistance, please... at 2006-02-15 04:45:34 from Jeff Frost

Browse sfpug by date

	From	Date	Subject
Next Message	Jeff Frost	2006-02-15 04:45:34	Re: SQL assistance, please...
Previous Message	Eric Walstad	2006-02-14 21:31:42	Re: invalid byte sequence for encoding "UNICODE": 0xd9