Re: clarifying a few error messages

From: Thomas O'Connell <tfo(at)monsterlabs(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: clarifying a few error messages
Date: 2003-01-13 18:39:15
Message-ID: tfo-5A3A0E.12391513012003@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

So I've managed to determine that the interrupt messages most likely
coincided with the server reboots.

Could the same thing have caused the signal 11? An unexpected external
event?

And is exit code 2 just related to the bad clog?

-tfo

In article <tfo-C7DADB(dot)15104309012003(at)news(dot)hub(dot)org>,
Thomas O'Connell <tfo(at)monsterlabs(dot)com> wrote:

> i'm hoping someone might be able to help me understand some of what
> might been going on in the environment external to postgres based on
> some error messages i just got in the logs of one of my pg installations.
>
> in a period of duress (i.e., the box itself was rebooting and postgres
> was dying) on one of my servers, i saw the following a few times:
>
> ERROR: deadlock detected
>
> shortly afterward, there was this:
>
> DEBUG: database system was interrupted at <timestamp>
>
> what can interrupt the database? i've seen it get terminated by signal 9
> when the box is failing, but is there any way to know what might have
> interrupted it? it seems to have died altogether, as it then goes
> through the checkpoint record, etc.
>
> after it starts up, this:
>
> DEBUG: database system is ready
> DEBUG: server process (pid 882) was terminated by signal 11
>
> does this mean postgres itself segfaulted or that it received an
> external SIGSEGV from a critical system process (e.g., the kernel)?
>
> also, is there any significance to the fact that those two statements
> occurred one right after the other?
>
> a little later, after another recovery, i see these:
>
> DEBUG: all server processes terminated; reinitializing shared memory
> and semaphores
> DEBUG: database system was interrupted at <timestamp>
>
> finally, it seems to stabilize for a bit. then, a little later, a whole
> spew of garbage characters in the log immediately preceding another:
>
> DEBUG: database system was interrupted at <timestamp>
>
> how would garbage data end up in the log? does that indicate anything
> about the manner in which postgres was interrupted?
>
> after yet another recovery, i see the following:
>
> FATAL 1: The database system is starting up
> FATAL 2: open of $PGDATA/pg_clog/0419 failed: No such file or directory
> FATAL 2: open of $PGDATA/pg_clog/0419 failed: No such file or directory
> FATAL 2: open of $PGDATA/pg_clog/0419 failed: No such file or directory
> FATAL 2: open of $PGDATA/pg_clog/0419 failed: No such file or directory
> FATAL 2: open of $PGDATA/pg_clog/0419 failed: No such file or directory
> DEBUG: server process (pid 945) exited with exit code 2
>
> here, what is exit code 2? does that just mean that postgres found a
> significant problem with clog files?
>
> then, there was one more of the shared memory/interruption pairs, a
> final recovery, and smooth sailing again.
>
> i guess i'd like to be able to determine if this is a system resources
> issue, and if so, which system resources. is this sequence something
> that can be prevented in the future via postgresql.conf? more memory?
>
> is there any way of knowing what actually brought down postgres from
> these messages?
>
> this installation is on a Linux box running kernel 2.4.18 with 1GB RAM.
>
> i can provide postgresql.conf settings upon request.
>
> thanks!
>
> -tfo

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Emmanuel Charpentier 2003-01-13 18:46:14 Re: [GENERAL] problem with update rules on a view (ODBC)
Previous Message Campano, Troy 2003-01-13 18:27:29 loading delimited files