Re: remote duplicate rows

From: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: remote duplicate rows
Date: 2006-09-14 02:52:29
Message-ID: 4508C3ED.6050805@cox.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/13/06 19:36, ljb wrote:
> junkone1(at)gmail(dot)com wrote:
>> hI
>> i have a bad situation that i did not have primary key. so i have a
>> table like this
>> colname1 colname2
>> 1 apple
>> 1 apple
>> 2 orange
>> 2 orange
>>
>> It is a very large table. how do i remove the duplctes quickly annd
>> without much change.

Since the 2 colname1 == 2 records are different (extra spaces in
colname2), how do you determine which is the correct record? (Or is
the extra space just an artifact?)

> Make a new table (with a primary key) and the same columns in order,
> and do: INSERT INTO newtable SELECT DISTINCT * FROM oldtable;

That's a single transaction, and since this is a "very large table",
it would be very unpleasant if it rolled back at 95%.

Of course, we don't know what junkone1's definition of "very large"
is and how beefy his hardware is...

If there are OIDs on the table, you could write a script with this
pseudocode, which because of the "candidate key table", transaction
block and LIMIT TO, allows the script to be restated. Niceties like
printing timestamp and a counter after every commit are always helpful.

CREATE TABLE BIGTABLE_PK (
COLNAME1 INTEGER);

INSERT INTO BIGTABLE_PK
SELECT DISTINCT COLNAME1
FROM BIGTABLE;

CREATE INDEX I_BIGTABLE_PK
ON BIGTABLE_PK (COLNAME1)
TYPE IS SORTED;

DECLARE LOOP_FLAG INTEGER = 1;
WHILE LOOP_FLAG DO
BEGIN TRANSACTION
FOR :X AS EACH ROW OF
SELECT COLNAME1
FROM BIGTABLE_PK
ORDER BY COLNAME1
LIMIT TO 2000 ROWS
DO
DELETE FROM BIGTABLE
WHERE OID IN (SELECT OID
FROM BIGTABLE_PK
WHERE COLNAME1 = :X.COLNAME1
LIMIT TO 1 ROWS);
DELETE FROM BIGTABLE_PK
WHERE COLNAME1 = :X.COLNAME1;
END FOR;
IF (SELECT COUNT(*) FROM BIGTABLE_PK) == 0 THEN
SET LOOP_FLAG = 0;
END IF;
COMMIT;
END ;

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFCMPtS9HxQb37XmcRArYMAKCGEK7ft1PLprnHtpjsLYlgs4t5gACdEksT
JY42ieEmRvehOsuU/o6YFR8=
=MJhV
-----END PGP SIGNATURE-----

In response to

Browse pgsql-general by date

  From Date Subject
Next Message a 2006-09-14 04:47:22 Re: select unique items in db
Previous Message David Link 2006-09-14 01:46:37 oracle listener intercept