Re: selecting duplicate records

From: Guy Fraser <guy(at)incentre(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: selecting duplicate records
Date: 2003-09-23 20:34:32
Message-ID: 3F70AE58.7030308@incentre.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

This is a test I did on one of my tables where I put duplicated entries :

select *
from acct_other_2003sep
except select sep.*
from (
select min(oid) as min_oid,
"Acct-Status-Type",
"User-Name",
"Realm",
"Acct-Session-Time",
"Acct-Input-Octets",
"Acct-Output-Octets",
"Called-Station-Id",
"Calling-Station-Id",
"Acct-Terminate-Cause",
"Framed-IP-Address",
"Service-Type",
"Framed-Protocol",
"Client-IP-Address",
"NAS-IP-Address",
"NAS-Port-Type",
"NAS-Port-Id",
"Acct-Session-Id",
"Acct-Link-Count",
"Acct-Multi-Session-Id"
from acct_other_2003sep
group by "Acct-Status-Type",
"User-Name",
"Realm",
"Acct-Session-Time",
"Acct-Input-Octets",
"Acct-Output-Octets",
"Called-Station-Id",
"Calling-Station-Id",
"Acct-Terminate-Cause",
"Framed-IP-Address",
"Service-Type",
"Framed-Protocol",
"Client-IP-Address",
"NAS-IP-Address",
"NAS-Port-Type",
"NAS-Port-Id",
"Acct-Session-Id",
"Acct-Link-Count",
"Acct-Multi-Session-Id"
) as min_sep,
acct_other_2003sep as sep
where sep.oid = min_sep.min_oid
;

From the above example you can see how to use a subselect to get a
unique list then using except, you can get the records that were not unique.

This may not be exactly what you want but it does implement some of the
methods required to get around using using temporary tables.

For some tasks using temporary tables may be more suitable if your query
becomes too complex and or you run out of memory/time.

Hope this helps.

Guy

Christopher Browne wrote:

>The world rejoiced as ch(at)rodos(dot)fzk(dot)de (Christoph Haller) wrote:
>
>
>>>1. How to select duplicate records only from a single table using a
>>>
>>>
>>select
>>
>>
>>>query.
>>>
>>>
>>>
>>e.g.
>>select sid,count(sid) from location group by sid having count(sid)>1;
>>
>>Do you get the idea?
>>Your request is pretty unspecific, so if this is not what you're asking
>>for,
>>try again.
>>
>>
>
>The aggregate is likely to perform horrifically badly. Here might
>be an option:
>
>Step 1. Find all of the duplicates...
>
>select a.* into temp table sid from some_table a, some_table b
> where a.oid < b.oid and
> a.field1 = b.field1 and
> a.field2 = b.field2 and
> a.field3 = b.field3 and
> ...
> a.fieldn = b.fieldn;
>
>Step 2. Look for the matching entries in the source table...
>
>select a.* from some_table a, sid b
> where
> a.field1 = b.field1 and
> a.field2 = b.field2 and
> a.field3 = b.field3 and
> ...
> a.fieldn = b.fieldn;
>
>[There's a weakness here; if there are multiple dupes, they may get
>picked multiple times in the second query :-(.]
>
>

--
Guy Fraser
Network Administrator
The Internet Centre
780-450-6787 , 1-888-450-6787

There is a fine line between genius and lunacy, fear not, walk the
line with pride. Not all things will end up as you wanted, but you
will certainly discover things the meek and timid will miss out on.

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Kumar 2003-09-24 08:50:22 Compatible datatype of image in SQL server
Previous Message Tom Lane 2003-09-23 17:18:29 Re: pl/pgsql, cursors and C function