Re: database errors

From: Michael Brusser <michael(at)synchronicity(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pgsql-Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: database errors
Date: 2004-05-14 00:08:55
Message-ID: DEEIJKLFNJGBEMBLBAHCKEHBEKAA.michael@synchronicity.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

It looks that "No such file or directory" followed by the abort signal
resulted from manually removing logs. pg_resetxlog took care of this,
but other problems persisted.

I got a copy of the database and installed it on the local partition.
It does seem badly corrupted, these are some hard errors.

pg_dump fails and dumps the core:

pg_dump: ERROR: XLogFlush: request 0/A971020 is not satisfied ---
flushed only to 0/5000050 ... lost synchronization with server, resetting
connection

looking at the core file:
(dbx) where 15
=>[1] _libc_kill(0x0, 0x6, 0x0, 0xffffffff, 0x2eaf00, 0xff135888), at
0xff19f938
[2] abort(0xff1bc004, 0xff1c3a4c, 0x0, 0x7efefeff, 0x21c08, 0x2404c4), at
0xff13596c
[3] elog(0x14, 0x267818, 0x0, 0xa971020, 0x0, 0x5006260), at 0x2407dc
[4] XLogFlush(0xffbee908, 0xffbee908, 0x827e0, 0x0, 0x0, 0x0), at 0x78530
[5] BufferSync(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x18df2c
[6] FlushBufferPool(0x2, 0x1e554, 0x0, 0x30000, 0x0, 0xffbeea79), at
0x18e5c4
[7] CreateCheckPoint(0x0, 0x0, 0x82c00, 0xff1bc004, 0x2212c, 0x83534), at
0x7d93c
[8] BootstrapMain(0x5, 0xffbeec50, 0x10, 0xffbeec50, 0xffbeebc8,
0xffbeebc8), at 0x836bc
[9] SSDataBase(0x3, 0x40a24a8a, 0x2e3800, 0x4, 0x2212c, 0x16f504), at
0x172590
[10] ServerLoop(0x5091, 0x2e398c, 0x2e3800, 0xff1c2940, 0xff1bc004,
0xff1c2940), at 0x16f3a0
[11] PostmasterMain(0x1, 0x323ad0, 0x2af000, 0x0, 0x65720000, 0x65720000),
at 0x16ef88
[12] main(0x1, 0xffbef68c, 0xffbef694, 0x2eaf08, 0x0, 0x0), at 0x12864c
======================
(I don't have the debug build at the moment to get more details)

this query fails:
LOG: query: select count (1) from note_links_aux;
ERROR: XLogFlush: request 0/A971020 is not satisfied --- flushed only to
0/5006260

drop table fails:
drop table note_links_aux;
ERROR: getObjectDescription: Rule 17019 does not exist

Are there any pointers as to why this could happen, aside
of potential memory and disk problems?

As for NFS... I know how strong the Postgresql community is advising
against it, but we have to face it: our customers ARE running on NFS
and they WILL be running on NFS.
Is there such a thing as "better" and "worse" NFS versions?
(I made a note of what was said about hard mount vs. soft mount, etc)

Tom, you recommended upgrade from 7.3.2 to 7.3.6
Out next release is using v 7.3.4. (maybe it's not too late to upgrade)
Would v. 7.3.6 provide more protection against problems like this?

Thank you,
Mike

> -----Original Message-----
... ...
> The messages you quote certainly read like a badly corrupted database to
> me. In the case of a local filesystem I'd be counseling you to start
> running memory and disk diagnostics. That may still be appropriate
> here, but you had better also reconsider the decision to use NFS.
>
> If you're absolutely set on using NFS, one possibly useful tip is to
> make sure it's a hard mount not a soft mount. If your systems support
> NFS-over-TCP instead of UDP, that might be worth trying too.
>
> Also I would strongly advise an update to PG 7.3.6. 7.3.2 has serious
> known bugs.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2004-05-14 00:11:45 Re: [HACKERS] threads stuff/UnixWare
Previous Message Larry Rosenman 2004-05-14 00:06:32 Re: [HACKERS] threads stuff/UnixWare