ACK table corrupted, unique index violated.

From: "Brian Hirt" <bhirt(at)mobygames(dot)com>
To: "Postgres Hackers" <pgsql-hackers(at)postgresql(dot)org>, <pgsql-general(at)postgresql(dot)org>
Cc: "Brian A Hirt" <bhirt(at)berkhirt(dot)com>
Subject: ACK table corrupted, unique index violated.
Date: 2001-12-12 15:10:07
Message-ID: 000b01c1831f$1b2ad020$640b0a0a@berkhirt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

I'm having a very scary problem.

First, here's my system: smp dual PIII800 512MB memory running redhat 6.2
kernel 2.2.18
PostgreSQL 7.1.3 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66

In a nutshell, my primary key index got a NOTICE to recreate when the
database was vacuumed. I dropped the index and tried to recreate it. I get
a key violation when i try to do this. I find there are some 200 rows
with the exact same developer_id and oid.

This is a very serious problem. 1) the uinque index should have prevented
this from happening. 2) i looked at my code and there is absolutely no way
my code inserted multiple rows with the same id.

This leads me to believe that there is a big problem with postgres; possibly
in vacuum. This has also underminded my confidence in postgres data
integrity that such a basic concept can be violated.

I want to help you guys find this problem because I have a lot invested in
postgres and overall have been very happy with it. I've included
information that i think might be useful. If there is more that i can
supply, let me know and I will provide it if I can.

My system runs a vacuum every day at 4AMCST. I've checked all of my
application logs for the day before the NOTICE appeared and the day that it
appeared. I see no SQL errors logged (and all SQL errors are logged by my
applications) for either day. I've checked the postmaster logs for both
days and don't see any ERROR's logged. There are some NOTICES the day
before that i don't know what they mean, but don't look good.

NOTICE: Cannot rename init file
/moby/pgsql/base/156130/pg_internal.init.19833 to
/moby/pgsql/base/156130/pg_internal.init: No such file or directory
NOTICE: Cannot rename init file
/moby/pgsql/base/156130/pg_internal.init.19839 to
/moby/pgsql/base/156130/pg_internal.init: No such file or directory
NOTICE: Cannot rename init file
/moby/pgsql/base/156130/pg_internal.init.19835 to
/moby/pgsql/base/156130/pg_internal.init: No such file or directory
NOTICE: Cannot rename init file
/moby/pgsql/base/156130/pg_internal.init.19834 to
/moby/pgsql/base/156130/pg_internal.init: No such file or directory
NOTICE: Cannot rename init file
/moby/pgsql/base/156130/pg_internal.init.19837 to
/moby/pgsql/base/156130/pg_internal.init: No such file or direc

however, on further inspection, the error above appears fairly frequently.
In fact, the first occurance was over 6 weeks before the corruption.

Here is the DEBUG notices for vacuum the day before the corruption:

DEBUG: --Relation developer--
DEBUG: Pages 514: Changed 29, reaped 39, Empty 0, New 0; Tup 47971: Vac 52,
Keep/VTL 0/0, Crash 0, UnUsed 89, MinLen 65, MaxLen 133; Re-using:
Free/Avail. Space 6812/1768; EndEmpty/Avail. Pages 0/10. CPU 0.04s/0.00u
sec.
DEBUG: Index developer_primary_key: Pages 120; Tuples 47971: Deleted 52.
CPU 0.00s/0.09u sec.
DEBUG: Index developer_recent_mod_key: Pages 119; Tuples 47971: Deleted 52.
CPU 0.00s/0.05u sec.
DEBUG: Index developer_approved: Pages 121; Tuples 47971: Deleted 52. CPU
0.03s/0.06u sec.
DEBUG: Index developer_search_idx: Pages 204; Tuples 47971: Deleted 52. CPU
0.00s/0.04u sec.
DEBUG: Rel developer: Pages: 514 --> 514; Tuple(s) moved: 15. CPU
0.01s/0.02u sec.
DEBUG: Index developer_primary_key: Pages 120; Tuples 47971: Deleted 15.
CPU 0.00s/0.05u sec.
DEBUG: Index developer_recent_mod_key: Pages 119; Tuples 47971: Deleted 15.
CPU 0.00s/0.04u sec.
DEBUG: Index developer_approved: Pages 121; Tuples 47971: Deleted 15. CPU
0.00s/0.05u sec.
DEBUG: Index developer_search_idx: Pages 204; Tuples 47971: Deleted 15. CPU
0.00s/0.05u sec.

Here is the vaccum DEBUG messages the day of the corruption:

DEBUG: --Relation developer--
DEBUG: Pages 515: Changed 25, reaped 39, Empty 0, New 0; Tup 48038: Vac 53,
Keep/VTL 0/0, Crash 0, UnUsed 89, MinLen 65, MaxLen 133; Re-using:
Free/Avail. Space 9144/3348; EndEmpty/Avail. Pages 0/9. CPU 0.02s/0.01u sec.
DEBUG: Index developer_primary_key: Pages 120; Tuples 48023: Deleted 53.
CPU 0.00s/0.04u sec.
NOTICE: Index developer_primary_key: NUMBER OF INDEX' TUPLES (48023) IS NOT
THE SAME AS HEAP' (48038).
Recreate the index.
DEBUG: Index developer_recent_mod_key: Pages 119; Tuples 48023: Deleted 53.
CPU 0.00s/0.04u sec.
NOTICE: Index developer_recent_mod_key: NUMBER OF INDEX' TUPLES (48023) IS
NOT THE SAME AS HEAP' (48038).
Recreate the index.
DEBUG: Index developer_approved: Pages 121; Tuples 48023: Deleted 53. CPU
0.00s/0.04u sec.
NOTICE: Index developer_approved: NUMBER OF INDEX' TUPLES (48023) IS NOT
THE SAME AS HEAP' (48038).
Recreate the index.
DEBUG: Index developer_search_idx: Pages 204; Tuples 48023: Deleted 53. CPU
0.00s/0.04u sec.
NOTICE: Index developer_search_idx: NUMBER OF INDEX' TUPLES (48023) IS NOT
THE SAME AS HEAP' (48038).
Recreate the index.
DEBUG: Rel developer: Pages: 515 --> 515; Tuple(s) moved: 34. CPU
0.00s/0.03u sec.
DEBUG: Index developer_primary_key: Pages 120; Tuples 48023: Deleted 34.
CPU 0.00s/0.04u sec.
NOTICE: Index developer_primary_key: NUMBER OF INDEX' TUPLES (48023) IS NOT
THE SAME AS HEAP' (48038).
Recreate the index.
DEBUG: Index developer_recent_mod_key: Pages 119; Tuples 48023: Deleted 34.
CPU 0.00s/0.04u sec.
NOTICE: Index developer_recent_mod_key: NUMBER OF INDEX' TUPLES (48023) IS
NOT THE SAME AS HEAP' (48038).
Recreate the index.
DEBUG: Index developer_approved: Pages 121; Tuples 48023: Deleted 34. CPU
0.00s/0.03u sec.
NOTICE: Index developer_approved: NUMBER OF INDEX' TUPLES (48023) IS NOT
THE SAME AS HEAP' (48038).
Recreate the index.
DEBUG: Index developer_search_idx: Pages 204; Tuples 48023: Deleted 34. CPU
0.00s/0.06u sec.
NOTICE: Index developer_search_idx: NUMBER OF INDEX' TUPLES (48023) IS NOT
THE SAME AS HEAP' (48038).
Recreate the index.

I also appear to be getting this quite often:

NOTICE: RegisterSharedInvalid: SI buffer overflow
NOTICE: InvalidateSharedInvalid: cache state reset

Here is output from me trying to re-create an index.

basement=# drop index developer_primary_key;
DROP
basement=# create unique index developer_primary_key on
developer(developer_id);ERROR: Cannot create unique index. Table contains
non-unique values
basement=# select developer_id,count(*) from developer group by developer_id
having count(*) > 1;
developer_id | count
--------------+-------
11107 | 2
18493 | 2
50983 | 2
50984 | 2
50985 | 2
50986 | 2
50987 | 2
50988 | 2
50989 | 2
50990 | 2
50991 | 2
50992 | 2
50993 | 2
50994 | 2
50995 | 2
50996 | 2
50997 | 2
51020 | 2
51021 | 2
51022 | 2
51023 | 2
51024 | 2
51025 | 2
51026 | 2
51029 | 2
51030 | 2
51031 | 2
51032 | 2
51033 | 2
51034 | 2
51035 | 2
51036 | 2
51037 | 2
51038 | 2
51039 | 2
51040 | 2
51041 | 2
51042 | 2
51043 | 3
51044 | 3
51045 | 3
51046 | 3
51047 | 3
51048 | 3
51049 | 3
51050 | 3
51051 | 3
51052 | 3
51053 | 3
51054 | 3
51055 | 3
51056 | 3
51057 | 3
51058 | 3
51059 | 3
51060 | 3
51061 | 3
51062 | 2
51063 | 2
51064 | 2
51065 | 2
51066 | 2
51067 | 2
51068 | 2
51069 | 2
51070 | 2
51071 | 2
51072 | 2
51073 | 2
51074 | 2
51075 | 2
51076 | 2
51152 | 2
51614 | 2
51615 | 2
51616 | 2
51617 | 2
51618 | 2
51619 | 2
51620 | 2
51621 | 2
51622 | 2
51623 | 2
51678 | 2
51680 | 2
51681 | 2
51682 | 2
51683 | 2
51768 | 2
51862 | 2
51863 | 2
51864 | 2
51950 | 2
52094 | 2
52095 | 2
52096 | 2
52097 | 2
52098 | 2
52099 | 2
52100 | 2
52101 | 2
52103 | 2
52104 | 2
52105 | 2
52106 | 2
52107 | 2
52108 | 2
52109 | 2
52110 | 2
52111 | 2
52112 | 2
52113 | 2
52114 | 2
52115 | 2
52116 | 2
52117 | 2
52118 | 2
52119 | 2
52120 | 2
52121 | 2
52122 | 2
52123 | 2
52124 | 2
52125 | 2
52126 | 2
52127 | 2
52128 | 2
52129 | 2
52130 | 2
52131 | 2
52132 | 2
52133 | 2
52134 | 2
52135 | 2
52136 | 2
52137 | 2
52167 | 2
52168 | 2
52169 | 2
52170 | 2
52171 | 2
52172 | 2
52173 | 2
52174 | 2
52175 | 2
52180 | 2
52181 | 2
52182 | 2
52222 | 2
52223 | 2
52224 | 2
52225 | 2
52226 | 2
52227 | 2
52228 | 2
52229 | 2
52230 | 2
52232 | 2
52233 | 2
52234 | 2
52235 | 2
52236 | 2
52237 | 2
52238 | 2
52239 | 2
52240 | 2
52241 | 2
52242 | 2
52243 | 2
52244 | 2
52245 | 2
52246 | 2
52247 | 2
52248 | 2
52249 | 2
52250 | 2
52251 | 2
52465 | 2
52466 | 2
52467 | 2
52468 | 2
52469 | 2
52470 | 2
52471 | 2
52472 | 2
52473 | 2
52474 | 2
52475 | 2
52476 | 2
52477 | 2
52478 | 2
52479 | 2
52480 | 2
52481 | 2
52482 | 2
52483 | 2
52484 | 2
52485 | 2
52486 | 2
52487 | 2
(200 rows)

basement=# select oid,developer_id from developer where developer_id =52487;
oid | developer_id
-----------+--------------
401440180 | 52487
401440180 | 52487

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Carlo Perassi 2001-12-12 15:26:54 exit status code table?
Previous Message Thomas Lockhart 2001-12-12 14:42:13 Re: about rserv!

Browse pgsql-hackers by date

  From Date Subject
Next Message mlw 2001-12-12 16:13:37 Re: Explicit configuration file
Previous Message Marc G. Fournier 2001-12-12 14:28:34 Re: Beta4 ...