Re: Filtering duplicated row with a trigger

From: "Partha Sur" <p(dot)sur(at)worldnet(dot)att(dot)net>
To: "papapep" <papapep(at)gmx(dot)net>
Cc: <pgsql-novice(at)postgresql(dot)org>
Subject: Re: Filtering duplicated row with a trigger
Date: 2003-10-07 00:27:41
Message-ID: 003b01c38c69$d624da60$3ead590c@welcome
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice


Hello,

If you wish to "clean" a table of duplicates the following SQL will work.
It is used widely in the Oracle world - and I just tried it successfully on
my Red Hat Linux 9 PostgreSQL 7.3.2 database.

If you have a table T with columns C1, C2, ... Cn and C1, C2, ... Cr are
the
candidate keys for a concatenated primary key (r <= n), then if T currently
contains duplicates (obviously with with no primary key constraint defined
on it - otherwise such a table with duplicate rows could not exist) then to
remove the duplicates so that only tuples with unique candidate key values
remain (i.e. cleaned) run the following delete SQL which uses a correlated
subquery :

delete from T t1
where oid < (select max (oid)
from T t2
where t1.C1 = t2.C1
and t1.C2 = t2.C2
...
and t1.Cr = t2.Cr) ;

This is based on the fact that though rows in a table may be duplicate they
must always have unique oid numbers.

Then with the above delete statement, only the row with the maximum value
for oid for a particular set of duplicate rows (for a particular value of
C1||C2...||Cr) will remain.
In the above SQL if you use instead:

where oid > (select min (oid) ...

then only the row with the minimum value of oid will remain ...

So \copy to a temp table (with no primary key constraints defined) and then
apply SQL similar to above to clean the loaded temp table which can then be
used to load the target production table with no duplicates.

Regards,

Partha Sur

> ----- Original Message -----
> From: "papapep" <papapep(at)gmx(dot)net>
> To: "pgsql-novice" <pgsql-novice(at)postgresql(dot)org>
> Sent: Monday, October 06, 2003 12:30 PM
> Subject: [NOVICE] Filtering duplicated row with a trigger
>
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > I've got plenty of data files (prepared to be inserted with the \copy
> > statement) but I have to filter them to be sure that there are no
> > duplicated rows inserted.
> > I know I should do it with a trigger that executes a function before
> > inserting the row and if it's duplicated do something with it (insert it
> > in another table, simply forget it, etc...). The theory is clear :-)
> > But the practice is not so clear (for me, of course).
> > Anyone can give me some guide to how the function should do the control
> > of duplicated rows?
> >
> > Thanks.
> >
> > Josep Sànchez
> > [papapep]
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.2.1 (MingW32)
> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> >
> > iD8DBQE/gZik2vx52x0kyz4RApbZAKCrhvCywbNH8Zce0xpfDhMNQBfQ+ACfShG6
> > 96nY7di8KnV8gJrcWIOzqLI=
> > =32il
> > -----END PGP SIGNATURE-----
> >
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
>

Browse pgsql-novice by date

  From Date Subject
Next Message zhix 2003-10-07 00:38:25 Re: SELECT syntax question - combining COUNT and DISTINCT
Previous Message Bruno Wolff III 2003-10-06 18:01:41 Re: [personal] Re: Filtering duplicated row with a trigger