EXCEPT Queries

From: Mark Mitchell <mark(at)lapcrew(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: EXCEPT Queries
Date: 2003-02-19 19:16:22
Message-ID: 3E53D806.5030501@lapcrew.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I have a query that I must dedup using data in another table.
This can be done quiet easily using an EXCEPT clause as long as both
queries return the same result set, the problem with that is I only want
to dedup based on a single columns value not dedup based on entire rows.
The only other way I can see of doing this is using a NOT IN () clause,
this seems to take much more time and resources than an EXCEPT
statement. Is there any way to quickly dedup two tables based on only
one row?

Here is an example of what I'm currently doing.

TABLE "A"
"SUBSCRIBER_NAME" | "ACCOUNT_NUMBER"
--------------------------------------
BOB | 000001
JOE | 000002

TABLE "B"
"SUBSCRIBER_NAME" | "ACCOUNT_NUMBER"
--------------------------------------
BOB | 000001

To dedup table "A" using the data in table "B" I could use the
following, except that the dedup takes place on the whole row when I
only want it to take place on the "ACCOUNT_NUMBER" column.

SELECT
"A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER"
FROM "A"
EXCEPT
SELECT
"B"."SUBSCRIBER_NAME" , "B"."ACCOUNT_NUMBER"
FROM "B"

I could use this but it takes WAY to long when both result sets contain
more than a few hundred records. It seems to increase exponentially the
more records you add to each result set. The data sets I'm working with
contain anywhere from 0 to 500000 records.

SELECT
"A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER"
FROM "A"
WHERE
"A"."ACCOUNT_NUMBER"
NOT IN
(
SELECT
"B"."ACCOUNT_NUMBER"
FROM "B"
)

--
Mark Mitchell
www.lapcrew.com
Registered Linux User # 299662
Mandrake 8.0 , RedHat 8.0

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Richard Huxton 2003-02-19 20:08:00 Re: simple join problem
Previous Message Guy Fraser 2003-02-19 19:06:56 Re: Passing arrays