Re: 7.0.3 database corruption

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 7.0.3 database corruption
Date: 2001-06-14 14:34:48
Message-ID: 3B28CB88.57B9442@tm.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
>
> > mlw wrote:
> >> After we run the
> >> scripts, it looks like the database is corrupt.
>
> It's impossible to say anything useful with such an undescriptive
> description of the problem.
>
> Hannu Krosing <hannu(at)tm(dot)ee> writes:
> > There certainly are bugs in 7.0.3 - I can describe at least two:
>
> I would really like to see a reproducible example of index corruption
> in 7.0.*. We've heard such reports often enough to know the problem
> is real, but without a test case in hand it's difficult to do much about
> it.

I know ;( Unfortunately this has happened only a few times on some quite
busy servers receiving a workload of quite varied queries.

> > 2. Some kind of stuck locks - a single backend stuck in "INSERT waiting"
>
> 7.0.*'s deadlock detection algorithm is known to have some holes, but
> deadlock couldn't be the explanation for just a single stuck backend.

that's what "ps ax| grep post" output looks like in my logs

Sun Jun 10 06:31:00 EET 2001
828 ? S 0:02 /usr/bin/postmaster -i -o -F
26652 ? S 5:20 /usr/bin/postgres localhost gamer casino idle
30082 ? S 0:20 /usr/bin/postgres 127.0.0.1 nobody casino
idle
30084 ? S 1:26 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31565 ? S 0:43 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31595 ? S 0:19 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31596 ? S 0:21 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31597 ? S 0:31 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31598 ? S 1:39 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31600 ? S 0:17 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31608 ? S 0:24 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31612 ? S 0:24 /usr/bin/postgres 127.0.0.1 nobody casino
idle
32080 ? S 0:43 /usr/bin/postgres localhost gamer casino
UPDATE waiti
32706 ? S 0:10 /usr/bin/postgres localhost gamer casino idle
302 ? S 0:00 /usr/bin/postgres 127.0.0.1 nobody casino
idle
361 ? S 0:00 sh -c date;ps ax|grep post
364 ? S 0:00 grep post

CHECKING WAITING PIDS: ['32080']
Sun Jun 10 06:31:10 EET 2001
828 ? S 0:02 /usr/bin/postmaster -i -o -F
26652 ? S 5:20 /usr/bin/postgres localhost gamer casino idle
30082 ? S 0:20 /usr/bin/postgres 127.0.0.1 nobody casino
idle
30084 ? S 1:26 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31565 ? S 0:43 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31595 ? S 0:19 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31596 ? S 0:21 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31597 ? S 0:31 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31598 ? S 1:39 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31600 ? S 0:17 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31608 ? S 0:24 /usr/bin/postgres 127.0.0.1 nobody casino
idle
31612 ? S 0:24 /usr/bin/postgres 127.0.0.1 nobody casino
idle
32080 ? S 0:43 /usr/bin/postgres localhost gamer casino
UPDATE waiti
32706 ? S 0:10 /usr/bin/postgres localhost gamer casino idle
302 ? S 0:00 /usr/bin/postgres 127.0.0.1 nobody casino
idle
365 ? S 0:00 sh -c date;ps ax|grep post
368 ? S 0:00 grep post

PROCESS 32080 STILL WAITING, RESTART TIME

> Again, any chance of looking at an example?

I could send you tails of postgres logfiles that are rotated on
detecting
the INSERT/UPDATE wait condition that does not go away in 10 sec.
How long logfiles (time) would be enough ?

There seems to be no general pattern that leads to it though ;(

---------------
Hannu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2001-06-14 14:37:57 Re: 7.0.3 database corruption
Previous Message David D. Kilzer 2001-06-14 14:31:06 [PATCH] Contrib C source for casting MONEY to INT[248] and FLOAT[48]