Re: duplicate primary index in bayes db from SpamAssassin

From: Michael Monnerie <michael(dot)monnerie(at)it-management(dot)at>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: duplicate primary index in bayes db from SpamAssassin
Date: 2008-04-15 06:26:31
Message-ID: 200804150826.37501@zmi.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Montag, 14. April 2008 Tom Lane wrote:
> What PG version is this?

I've looked a little more into history, and PostgreSQL was 8.1.5
originally from CD, then 8.1.9 and now 8.1.11.

I'm currently investigating a way to delete duplicates from the table.
Is there any chance to do that? A
SELECT * from bayes_token;
shows no dups, as there's a primary index... at least it works here.

This is the bayes_token table and its index:
CREATE TABLE bayes_token (
id integer DEFAULT 0 NOT NULL,
token bytea DEFAULT ''::bytea NOT NULL,
spam_count integer DEFAULT 0 NOT NULL,
ham_count integer DEFAULT 0 NOT NULL,
atime integer DEFAULT 0 NOT NULL
);
ALTER TABLE ONLY bayes_token
ADD CONSTRAINT bayes_token_pkey PRIMARY KEY (id, token);

Could I create a new db without indices, restore the data there, and
then run a "delete from.. where duplicates"? Sometimes there's 3x the
same content in the primary index, but I'd need to delete all except
the one with the highest spam_count.

I just found I even have a duplicate in bayes_vars:

COPY bayes_vars (id, username, spam_count, ham_count, token_count,
last_expire, last_atime_delta, last_expire_reduce, oldest_token_age,
newest_token_age) FROM stdin;
1 vscan 194393 517531 2602114 1206030039 2764800 15304
1203265204 1206057801
1 vscan 194398 517535 2602799 1206030039 2764800 15304
1203265204 1206064729
4 vscan 18305 25403 2042983 1208143427 1382400 13268
1206501543 1208216540
\.

Definition:
CREATE TABLE bayes_vars (
id serial NOT NULL,
username character varying(200) DEFAULT ''::character varying NOT
NULL,
spam_count integer DEFAULT 0 NOT NULL,
ham_count integer DEFAULT 0 NOT NULL,
token_count integer DEFAULT 0 NOT NULL,
last_expire integer DEFAULT 0 NOT NULL,
last_atime_delta integer DEFAULT 0 NOT NULL,
last_expire_reduce integer DEFAULT 0 NOT NULL,
oldest_token_age integer DEFAULT 2147483647 NOT NULL,
newest_token_age integer DEFAULT 0 NOT NULL
);
ALTER TABLE ONLY bayes_vars
ADD CONSTRAINT bayes_vars_pkey PRIMARY KEY (id);

But here, I could see the dups with SELECT, and even delete one record:

# select * from bayes_vars;
id | username | spam_count | ham_count | token_count | last_expire |
last_atime_delta | last_expire_reduce | oldest_token_age |
newest_token_age
----+----------+------------+-----------+-------------+-------------+------------------+--------------------+------------------+------------------
1 | vscan | 194393 | 517531 | 2602114 | 1206030039 |
2764800 | 15304 | 1203265204 | 1206057801
1 | vscan | 194398 | 517535 | 2602799 | 1206030039 |
2764800 | 15304 | 1203265204 | 1206064729
4 | vscan | 18375 | 25828 | 2050196 | 1208229525 |
1382400 | 13268 | 1206501543 | 1208240610
(3 Zeilen)

bayes_pg_v1=# delete from bayes_vars where spam_count =194393;
DELETE 1
bayes_pg_v1=# select * from bayes_vars;
id | username | spam_count | ham_count | token_count | last_expire |
last_atime_delta | last_expire_reduce | oldest_token_age |
newest_token_age
----+----------+------------+-----------+-------------+-------------+------------------+--------------------+------------------+------------------
1 | vscan | 194398 | 517535 | 2602799 | 1206030039 |
2764800 | 15304 | 1203265204 | 1206064729
4 | vscan | 18375 | 25829 | 2050215 | 1208229525 |
1382400 | 13268 | 1206501543 | 1208240637
(2 Zeilen)

mfg zmi
--
// Michael Monnerie, Ing.BSc ----- http://it-management.at
// Tel: 0676/846 914 666 .network.your.ideas.
// PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4
// Keyserver: www.keyserver.net Key-ID: 1C1209B4

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Mikko Partio 2008-04-15 06:36:33 Re: FATAL: could not open relation xxx: No such file or directory
Previous Message yogesh 2008-04-15 05:11:27 Restore Database From data folder