Re: Checkpoint request failed on version 8.2.1.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Patrick Earl" <patearl(at)patearl(dot)net>
Cc: pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Checkpoint request failed on version 8.2.1.
Date: 2007-01-11 20:14:37
Message-ID: 29277.1168546477@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

"Patrick Earl" <patearl(at)patearl(dot)net> writes:
> In any case, the unit tests remove all contents and schema within the
> database before starting, and they remove the tables they create as
> they proceed. Certainly there are many things have been recently
> deleted.

Yeah, I think then there's no question that the bgwriter is trying to
fsync something that's been deleted but isn't yet closed by every
process. We have things set up so that that's not a really serious
problem anymore --- eventually it will be closed and then the next
checkpoint will succeed. But CREATE DATABASE insists on checkpointing
and so it's vulnerable to even a transient failure.

I've been resisting changing the checkpoint code to treat EACCES as a
non-error situation on Windows, but maybe we have no choice. How do
people feel about this idea: #ifdef WIN32 and the open or fsync fails
with EACCES, then

1. Emit a LOG (or maybe DEBUG) message noting the problem.
2. Leave the fsync request entry in the hashtable for next time.
3. Allow the current checkpoint to complete normally anyway.

If the file has actually been deleted, then eventually it will be closed
and the next checkpoint will be able to remove the hash entry. If
there's something else wrong, we'll keep bleating and maybe the DBA will
notice eventually.

The downside of this is that a real EACCES problem wouldn't get noted at
any level higher than LOG, and so you could theoretically lose data
without much warning. But I'm not seeing anything else we could do
about it --- AFAIK we have not heard of a way we can distinguish this
case from a real permissions problem. And anyway there should never
*be* a real permissions problem; if there is then the user's been poking
under the hood sufficient to void the warranty anyway ;-)

Comments?

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bruno Wolff III 2007-01-11 20:18:13 Re: Remove duplicate rows
Previous Message Bruce Momjian 2007-01-11 20:04:25 Re: ORDER BY col is NULL in UNION causes error?

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2007-01-11 20:19:31 Re: [HACKERS] Checkpoint request failed on version 8.2.1.
Previous Message Alvaro Herrera 2007-01-11 19:49:28 Re: [HACKERS] unusual performance for vac following 8.2 upgrade