Skip site navigation (1) Skip section navigation (2)

Re: removing specific duplicates

From: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
To: PgSQL Novice ML <pgsql-novice(at)postgresql(dot)org>
Subject: Re: removing specific duplicates
Date: 2002-07-15 17:35:50
Message-ID: 1026754555.23763.4.camel@rebel (view raw or flat)
Thread:
Lists: pgsql-novice
Instead of One Big Delete Statement, the other way to do it 
is in a loop in pg/sql.  This is the way I like to do it 
because then I can put progress indicators to show how fast 
or slow the statements are running.
 
On Mon, 2002-07-15 at 09:46, Josh Jore wrote:
> Al,
> First thing you should do is add a serial not null unique column to that
> table. If you are going to need to do this sort of operation then you need
> something by which to get a handle to a specific row.
> 
> So this command will delete exactly one duplicate from your table. You
> need to keep runnin it until it deletes zero rows. Is this idea? No. It's
> one step above worst case. Unless you *like* this sort of pain, fix your
> app to prevent this sort of  duplication. You could have a trigger or rule
> stop duplicate inserts. You could use a unique index. Whatever works for
> you. Just do *something*.
> 
> DELETE FROM temp WHERE oid IN
> 	(
> 	SELECT	temp.oid
> 	FROM	temp
> 	JOIN
> 	( SELECT count(*), sample, spec, temp, measurement
> 	  FROM temp
> 	  GROUP BY sample, spec, temp, measurement ) dup
> 	USING (sample, spec, temp, measurement)
> 	WHERE dup.count > 1
> 	LIMIT 1;
> 
> 
> So your next question. How to replace the existing unique data with a
> single averaage. Requesting the average is easy:
> 
> > sample  spec  temp  voltage  measurement
> > 1           Tdsu  25      4.5       1.12e-9
> > 1           Tdsu  25      4.5       1.3e-9
> 
> SELECT sample, spec, temp, voltage, AVG(measurement)
> FROM temp GROUP BY sample, spec, temp, voltage;
> 
> So if you wanted to keep the average and throw away the detail data then
> just delete the old row and insert the new row (which you saved prior to
> deleting the data)
> 
> 
> Joshua b. Jore ; http://www.greentechnologist.org
> 
> On Fri, 12 Jul 2002, Al Arduengo wrote:
> 
> > I have a table of characterization data from a test of a certain chip where
> > I work. THe table consists of 5 columns:
> >
> > sample int
> > spec text
> > temp int
> > voltage float
> > measurement float
> >
> > Each sample chip (1-120) is tested at three temps and two voltages. The
> > column of interest is measurement. The problem is that some of the samples
> > were tested multiple times for one temp/voltage combination so I have
> > entries such as:
> >
> >
> > My requirement is to delete any duplicate rows (such as #2 in this case)
> > and then take the average of #1 and #3, replace #1 with that average
> > measurement and delete #3. I simply cannot figure out how to first get rid
> > of *just* #2 and then somehow find the case of #1 and #3 existing and then
> > do the average followed by removing #3. Using DISTINCT with SELECT will
> > obviously spit out #1 and #3 but that doesn't get me anywhere. I am
> > somewhat of a SQL novice so please have pity on me in your possible
> > explanations. My ultimate goal is to do these modifications without having
> > to go in and manually find these situations and then manually make the
> > changes. I think it is possible with SQL but I can't figure out how.
> >
> > Thanks very much in advance.
> > -Al Arduengo

-- 
+-----------------------------------------------------------------+
| Ron Johnson, Jr.        Home: ron(dot)l(dot)johnson(at)cox(dot)net             |
| Jefferson, LA  USA      http://ronandheather.dhs.org:81         |
|                                                                 |
| "Experience should teach us to be most on our guard to protect  |
|  liberty when the government's purposes are beneficent. Men     |
|  born to freedom are naturally alert to repel invasion of their |
|  liberty by evil minded rulers. The greatest dangers to liberty |
|  lurk in insidious encroachment by men of zeal, well-meaning    |
|  but without understanding."                                    |
|   Justice Louis Brandeis, dissenting, Olmstead v US (1928)      |
+-----------------------------------------------------------------+


In response to

pgsql-novice by date

Next:From: Chad ThompsonDate: 2002-07-15 18:45:58
Subject: Seeding
Previous:From: Hillensbeck, PrestonDate: 2002-07-15 16:19:31
Subject: Re: POSTGRESQL for WINDOWS NT4

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group