Re: removing specific duplicates

From: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
To: PgSQL Novice ML <pgsql-novice(at)postgresql(dot)org>
Subject: Re: removing specific duplicates
Date: 2002-07-15 17:35:50
Message-ID: 1026754555.23763.4.camel@rebel
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Instead of One Big Delete Statement, the other way to do it
is in a loop in pg/sql. This is the way I like to do it
because then I can put progress indicators to show how fast
or slow the statements are running.

On Mon, 2002-07-15 at 09:46, Josh Jore wrote:
> Al,
> First thing you should do is add a serial not null unique column to that
> table. If you are going to need to do this sort of operation then you need
> something by which to get a handle to a specific row.
>
> So this command will delete exactly one duplicate from your table. You
> need to keep runnin it until it deletes zero rows. Is this idea? No. It's
> one step above worst case. Unless you *like* this sort of pain, fix your
> app to prevent this sort of duplication. You could have a trigger or rule
> stop duplicate inserts. You could use a unique index. Whatever works for
> you. Just do *something*.
>
> DELETE FROM temp WHERE oid IN
> (
> SELECT temp.oid
> FROM temp
> JOIN
> ( SELECT count(*), sample, spec, temp, measurement
> FROM temp
> GROUP BY sample, spec, temp, measurement ) dup
> USING (sample, spec, temp, measurement)
> WHERE dup.count > 1
> LIMIT 1;
>
>
> So your next question. How to replace the existing unique data with a
> single averaage. Requesting the average is easy:
>
> > sample spec temp voltage measurement
> > 1 Tdsu 25 4.5 1.12e-9
> > 1 Tdsu 25 4.5 1.3e-9
>
> SELECT sample, spec, temp, voltage, AVG(measurement)
> FROM temp GROUP BY sample, spec, temp, voltage;
>
> So if you wanted to keep the average and throw away the detail data then
> just delete the old row and insert the new row (which you saved prior to
> deleting the data)
>
>
> Joshua b. Jore ; http://www.greentechnologist.org
>
> On Fri, 12 Jul 2002, Al Arduengo wrote:
>
> > I have a table of characterization data from a test of a certain chip where
> > I work. THe table consists of 5 columns:
> >
> > sample int
> > spec text
> > temp int
> > voltage float
> > measurement float
> >
> > Each sample chip (1-120) is tested at three temps and two voltages. The
> > column of interest is measurement. The problem is that some of the samples
> > were tested multiple times for one temp/voltage combination so I have
> > entries such as:
> >
> >
> > My requirement is to delete any duplicate rows (such as #2 in this case)
> > and then take the average of #1 and #3, replace #1 with that average
> > measurement and delete #3. I simply cannot figure out how to first get rid
> > of *just* #2 and then somehow find the case of #1 and #3 existing and then
> > do the average followed by removing #3. Using DISTINCT with SELECT will
> > obviously spit out #1 and #3 but that doesn't get me anywhere. I am
> > somewhat of a SQL novice so please have pity on me in your possible
> > explanations. My ultimate goal is to do these modifications without having
> > to go in and manually find these situations and then manually make the
> > changes. I think it is possible with SQL but I can't figure out how.
> >
> > Thanks very much in advance.
> > -Al Arduengo

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr. Home: ron(dot)l(dot)johnson(at)cox(dot)net |
| Jefferson, LA USA http://ronandheather.dhs.org:81 |
| |
| "Experience should teach us to be most on our guard to protect |
| liberty when the government's purposes are beneficent. Men |
| born to freedom are naturally alert to repel invasion of their |
| liberty by evil minded rulers. The greatest dangers to liberty |
| lurk in insidious encroachment by men of zeal, well-meaning |
| but without understanding." |
| Justice Louis Brandeis, dissenting, Olmstead v US (1928) |
+-----------------------------------------------------------------+

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Chad Thompson 2002-07-15 18:45:58 Seeding
Previous Message Hillensbeck, Preston 2002-07-15 16:19:31 Re: POSTGRESQL for WINDOWS NT4