Re: Postgres abort found in 9.3.11

From: "K S, Sandhya (Nokia - IN/Bangalore)" <sandhya(dot)k_s(at)nokia(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Itnal, Prakash (Nokia - IN/Bangalore)" <prakash(dot)itnal(at)nokia(dot)com>
Subject: Re: Postgres abort found in 9.3.11
Date: 2016-09-01 10:45:51
Message-ID: DB5PR07MB154156B5B062C8769E8A569ED6E20@DB5PR07MB1541.eurprd07.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Tom,

Apologies for delayed reply.

Our setup is a hot-standby architecture. This crash is occurring only on stand-by node. Postgres continues to run without any issues on active node.
Postmaster is waiting for a start and is throwing this message.

Aug 22 11:44:21.462555 info node-0 postgres[8222]: [1-2] HINT: Is another postmaster already running on port 5433? If not, wait a few seconds and retry.
Aug 22 11:44:52.065760 crit node-1 postgres[8629]: [18-1] err-3: btree_xlog_delete_get_latestRemovedXid: cannot operate with inconsistent dataAug 22 11:44:52.065971 crit CFPU-1 postgres[8629]: [18-2] CONTEXT: xlog redo delete: index 1663/16386/17378; iblk 1, heap 1663/16386/16518;
Aug 22 11:44:52.085486 info node-1 coredumper: Generating core file

The standby postgres recovers automatically on next restart. This is because we always copy db freshly from active node on restart.

We implemented one patch to force kill walsender on active side. This is done to avoid prolonged wait if standby node is not reachable (for eg. Force power off or LAN cable removal). This implementation exists from long time. However the issue only recently observed after upgrading to 9.3.11. Do you think this force kill of walsender might lead to such issues in latest postgres?

Regards,
Sandhya

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Tuesday, August 30, 2016 5:09 PM
To: K S, Sandhya (Nokia - IN/Bangalore) <sandhya(dot)k_s(at)nokia(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org; Itnal, Prakash (Nokia - IN/Bangalore) <prakash(dot)itnal(at)nokia(dot)com>
Subject: Re: [HACKERS] Postgres abort found in 9.3.11

"K S, Sandhya (Nokia - IN/Bangalore)" <sandhya(dot)k_s(at)nokia(dot)com> writes:
> During the server restart, we are getting postgres crash with sigabrt. No other operation being performed.
> Attached the backtrace.

What shows up in the postmaster log?

> The occurrence is occasional. The issue is seen once in 30~50 times.

Does it successfully restart if you try again? If not, what are you
doing to recover?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rahila Syed 2016-09-01 10:53:33 Re: Surprising behaviour of \set AUTOCOMMIT ON
Previous Message Simon Riggs 2016-09-01 10:42:10 Re: WAL consistency check facility