Re: Postgres abort found in 9.3.11

From: "K S, Sandhya (Nokia - IN/Bangalore)" <sandhya(dot)k_s(at)nokia(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Itnal, Prakash (Nokia - IN/Bangalore)" <prakash(dot)itnal(at)nokia(dot)com>
Subject: Re: Postgres abort found in 9.3.11
Date: 2016-09-06 09:57:33
Message-ID: DB5PR07MB154194841E3C44F727C65F78D6F90@DB5PR07MB1541.eurprd07.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I was able to find a patch file where there is a call to ExitPostmaster() in postmaster.c .

@@ -3081,6 +3081,11 @@
shmem_exit(1);
reset_shared(PostPortNumber);

+ /* recovery termination */
+ ereport(FATAL,
+ (errmsg("recovery termination due to process crash")));
+ ExitPostmaster(99);
+
StartupPID = StartupDataBase();
Assert(StartupPID != 0);
pmState = PM_STARTUP;

But this patch is there from 2009 when Postgres was upgraded to 9.0. I am checking on why this patch was introduced in the first place.
Still the question exists of why the issue is not seen in version 9.3.9 but exists in 9.3.11.

Also the case of standalone recovery is taken care of with introduction of the patch file.

"err-3" is part of postgres source code(nbtxlog.c). Two different lines are combined probably leading to confusion.
Aug 22 11:44:52.065760 crit node-1 postgres[8629]: [18-1] err-3: btree_xlog_delete_get_latestRemovedXid: cannot operate with inconsistent data
Aug 22 11:44:52.065971 crit node-1 postgres[8629]: [18-2] CONTEXT: xlog redo delete: index 1663/16386/17378; iblk 1, heap 1663/16386/16518;

Thanks in advance!!!
Sandhya

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Thursday, September 01, 2016 7:19 PM
To: K S, Sandhya (Nokia - IN/Bangalore) <sandhya(dot)k_s(at)nokia(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org; Itnal, Prakash (Nokia - IN/Bangalore) <prakash(dot)itnal(at)nokia(dot)com>
Subject: Re: [HACKERS] Postgres abort found in 9.3.11

"K S, Sandhya (Nokia - IN/Bangalore)" <sandhya(dot)k_s(at)nokia(dot)com> writes:
> Our setup is a hot-standby architecture. This crash is occurring only on stand-by node. Postgres continues to run without any issues on active node.
> Postmaster is waiting for a start and is throwing this message.

> Aug 22 11:44:21.462555 info node-0 postgres[8222]: [1-2] HINT: Is another postmaster already running on port 5433? If not, wait a few seconds and retry.
> Aug 22 11:44:52.065760 crit node-1 postgres[8629]: [18-1] err-3: btree_xlog_delete_get_latestRemovedXid: cannot operate with inconsistent dataAug 22 11:44:52.065971 crit CFPU-1 postgres[8629]: [18-2] CONTEXT: xlog redo delete: index 1663/16386/17378; iblk 1, heap 1663/16386/16518;

Hmm, that HINT seems to be the tail end of a message indicating that the
postmaster is refusing to start because of an existing postmaster. Why
is that appearing? If you've got some script that's overeagerly launching
and killing postmasters, maybe that's the ultimate cause of problems.

The only method I've heard of for getting that get_latestRemovedXid
error is to try to launch a standalone backend (postgres --single)
in a standby server directory. We don't support that, cf
https://www.postgresql.org/message-id/flat/00F0B2CEF6D0CEF8A90119D4%40eje.credativ.lan

BTW, I'm curious about the "err-3:" part. That would not be expected
in any standard build of Postgres ... is this something custom modified?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2016-09-06 10:04:23 Re: LOCK TABLE .. DEFERRABLE
Previous Message Pavan Deolasee 2016-09-06 09:56:49 Re: Override compile time log levels of specific messages/modules