Re: Need help with error

From: Steven Saner <ssaner(at)pantheranet(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Need help with error
Date: 2000-07-05 21:51:02
Message-ID: 20000705165102.A15206@pantheranet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Jul 05, 2000 at 04:29:16PM -0400, Tom Lane wrote:
> Steven Saner <ssaner(at)pantheranet(dot)com> writes:
> > Using Postgres 7.0 on BSDI 4.1
> > For the last several days we are getting errors that look like this:
>
> > Error: cannot write block 0 of krftmp4 [adm] blind.
>
> > An interesting thing is that in this example, krftmp4 is a table that
> > the user that got this error message would not have accessed in any
> > way.
>
> Right --- that's implicit in the blind-write logic. A blind write
> means trying to dump out a dirty page from the shared buffer pool
> that belongs to a relation your own backend hasn't touched.
>
> Since the write fails, the dirty block remains in the shared buffer
> pool, waiting for some other backend to try to dump it again and fail
> again :-(
>
> The simplest recovery method is to restart the postmaster, causing a new
> buffer pool to be set up.
>
> However, from a developer's perspective, I'm more interested in finding
> out how you got into this state in the first place. We thought we'd
> fixed all the bugs that could give rise to orphaned dirty blocks, which
> was the cause of this type of error in all the cases we'd seen so far.
> Perhaps there is still a remaining bug of that kind, or maybe you've
> found a new way to cause this problem. Do you have time to do some
> investigation before you restart the postmaster?
>
> One thing I'd like to know is why the write is failing in the first
> place. Have you deleted or renamed the krftmp4 table, or its containing
> database adm, probably not too long before these errors started
> appearing?

Well, we have had this database version/configuration in operation for
a month or so. We rebooted the server July 1 as part of our normal
maintenance procedure. It has been after that that we have begun
seeing these problems. The adm database has been around for a long
time. The krftmp4 table is not the only table that I have seen listed
in these error messages.

> > When this happens, it seems that the backend dies, which
> > ends up causing the backend connections for all users to die.
>
> That shouldn't be happening either; blind write failure is classed as
> a simple ERROR, not a FATAL error. Does any message appear in the
> postmaster log? Is a corefile dumped, and if so what do you get from
> a backtrace?

Unfortunatly, I don't believe that there is any postmaster log. I will
probably have to restart the postmaster and redirect the stdout to a
file or something. As far as I can tell there are no core files being
created.

I will probably restart the postmaster tonight and make sure that
logging is being done. Then if it happens again we might have more to
go on.

Steve

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ned Lilly 2000-07-05 22:06:08 Re: [HACKERS] Re: Revised Copyright: is this morepalatable?
Previous Message Robert B. Easter 2000-07-05 21:46:59 Question about tape backup of online database.