Re: SQL assistance, please...

From: Eric Walstad <eric(at)ericwalstad(dot)com>
To: sfpug(at)postgresql(dot)org
Subject: Re: SQL assistance, please...
Date: 2006-02-15 06:34:25
Message-ID: 200602142234.26196.eric@ericwalstad.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: sfpug

Hi Jeff,

On Tuesday 14 February 2006 20:45, Jeff Frost wrote:
> Eric,
>
> You probably want to run a query with a self join like this:
>
> SELECT a1.addr1, a1.address_hash FROM address a1, address a2 WHERE
> a1.address_hash = a2.address_hash AND
> a1.addr1 != a2.addr1;

Yes, I tried this, but it gives me duplicate (well, almost duplicate) results
like:

id1 | id2 | address_hash | a1_addr1 | a2_addr1
---------+---------+--------------+------------------+----------------
1986889 | 1150278 | -1000076938 | 123 My Street | 345 Your Street
1150278 | 1986889 | -1000076938 | 345 Your Street | 123 My Street

Also, it only looks for cases where the addr1 fields differ. This is better
than nothing; I can count the records and divide by two to get a total count
of collisions.

Thanks for mentioning this. Maybe I was closer than I thought. I suppose I
could OR together the comparisons of the other fields (a1.addr1 != a2.addr1
OR a1.addr2 != a2.addr2, etc) to look for instances of other mismatched
fields.

> You should probably also consider placing a unique constraint on
> address_hash so you'll get an error when attempting to insert a duplicate.
Unfortunately, I can't do this. This table data is mostly static and defined
by others. It's my job to be able to match my user's inputs to the data in
this table. My best option, I think, it to change hashing algorithms if the
number of collisions is 'too large'.

Thanks again,

Eric.

In response to

Browse sfpug by date

  From Date Subject
Next Message chris mungall 2006-02-20 05:49:48 Positions available at LBL: bio-ontologies
Previous Message Jeff Frost 2006-02-15 04:45:34 Re: SQL assistance, please...