Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE constraint

From: Travis Cross <travis(at)crosswirecorp(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Duplicate rows sneaking in despite PRIMARY KEY / UNIQUE constraint
Date: 2006-06-06 09:45:26
Message-ID: 44854EB6.8020606@crosswirecorp.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have a table that I am using to store email token data for DSPAM.

I'm noticing that a handful (4-16) of rows with duplicate columns
(uid,token) are sneaking into the table every day despite the
primary key constraint.

The server currently processes a few thousand emails per day, and
this particular table currently has about 4.5 million rows in it.

I feel as though I must be missing something here, because I have
always strongly assumed that postgresql prevents this sort of chaos
from happening by default.

When these duplicate pairs make their way into the table, all havoc
breaks loose ;) The rows with the duplicate pairs seem to become
land mines. The postgresql process handling a query that
subsequently runs across one of these rows dies, taking down
the DSPAM daemon with it, and sometimes corrupting the postgresql
shared memory enough that the postmaster has to shutdown the other
processes and restart everything anew [1].

I am usually able to clean things up by running the following, but
once or twice I've had to drop the unique constraint before
postgresql would process the request without choking:

delete from dspam_token_data
where row(uid,token) in
(select uid,token
from dspam_token_data
group by uid,token
having count(*) > 1);

(I don't worry about preserving one of the duplicates here.)

I'm running postgresql-8.1.3. Here is the table in question:

CREATE TABLE dspam.dspam_token_data
(
uid int4 NOT NULL,
token int8 NOT NULL,
spam_hits int4,
innocent_hits int4,
last_hit date,
CONSTRAINT dspam_token_data_pkey PRIMARY KEY (uid, token)
)
WITHOUT OIDS;

[2]

What steps can I take to fix this? Is there any additional
information I can provide?

I've cleaned the table many times now. I then run VACUUM ANALYZE.

My next step will probably be to hack the DSPAM sources to make the
application more careful about not trying to insert rows that would
violate the unique constraint. Even still, it seems that would only
reduce the frequency of these occurrences, not eliminate them
completely.

Thanks!

Cheers,

-- Travis

----
Notes:
[1] A condensed log file showing off exactly what happens here is
attached.

[2] Previously, the table lacked a primary key and instead used a
unique constraint and index. This yielded the exact same results I
am currently seeing using a two-column primary key, as above. The
old table schema was:

CREATE TABLE dspam.dspam_token_data
(
uid int4,
token int8,
spam_hits int4,
innocent_hits int4,
last_hit date,
CONSTRAINT dspam_token_data_uid_key UNIQUE (uid, token)
)
WITHOUT OIDS;

Attachment Content-Type Size
postgresql-crash.txt text/plain 2.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joachim Wieland 2006-06-06 10:33:12 AIX check in datetime.h
Previous Message ITAGAKI Takahiro 2006-06-06 09:02:19 table/index fillfactor control