Quick Links

Re: Postgres abort found in 9.3.11

From:	"K S, Sandhya (Nokia - IN/Bangalore)" <sandhya(dot)k_s(at)nokia(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Itnal, Prakash (Nokia - IN/Bangalore)" <prakash(dot)itnal(at)nokia(dot)com>
Subject:	Re: Postgres abort found in 9.3.11
Date:	2016-09-01 10:45:51
Message-ID:	DB5PR07MB154156B5B062C8769E8A569ED6E20@DB5PR07MB1541.eurprd07.prod.outlook.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello Tom,

Apologies for delayed reply.

Our setup is a hot-standby architecture. This crash is occurring only on stand-by node. Postgres continues to run without any issues on active node.
Postmaster is waiting for a start and is throwing this message.

Aug 22 11:44:21.462555 info node-0 postgres[8222]: [1-2] HINT: Is another postmaster already running on port 5433? If not, wait a few seconds and retry.
Aug 22 11:44:52.065760 crit node-1 postgres[8629]: [18-1] err-3: btree_xlog_delete_get_latestRemovedXid: cannot operate with inconsistent dataAug 22 11:44:52.065971 crit CFPU-1 postgres[8629]: [18-2] CONTEXT: xlog redo delete: index 1663/16386/17378; iblk 1, heap 1663/16386/16518;
Aug 22 11:44:52.085486 info node-1 coredumper: Generating core file

The standby postgres recovers automatically on next restart. This is because we always copy db freshly from active node on restart.

We implemented one patch to force kill walsender on active side. This is done to avoid prolonged wait if standby node is not reachable (for eg. Force power off or LAN cable removal). This implementation exists from long time. However the issue only recently observed after upgrading to 9.3.11. Do you think this force kill of walsender might lead to such issues in latest postgres?

Regards,
Sandhya

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Tuesday, August 30, 2016 5:09 PM
To: K S, Sandhya (Nokia - IN/Bangalore) <sandhya(dot)k_s(at)nokia(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org; Itnal, Prakash (Nokia - IN/Bangalore) <prakash(dot)itnal(at)nokia(dot)com>
Subject: Re: [HACKERS] Postgres abort found in 9.3.11

"K S, Sandhya (Nokia - IN/Bangalore)" <sandhya(dot)k_s(at)nokia(dot)com> writes:
> During the server restart, we are getting postgres crash with sigabrt. No other operation being performed.
> Attached the backtrace.

What shows up in the postmaster log?

> The occurrence is occasional. The issue is seen once in 30~50 times.

Does it successfully restart if you try again? If not, what are you
doing to recover?

regards, tom lane

In response to

Re: Postgres abort found in 9.3.11 at 2016-08-30 11:39:16 from Tom Lane

Responses

Re: Postgres abort found in 9.3.11 at 2016-09-01 13:49:16 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Rahila Syed	2016-09-01 10:53:33	Re: Surprising behaviour of \set AUTOCOMMIT ON
Previous Message	Simon Riggs	2016-09-01 10:42:10	Re: WAL consistency check facility