Re: What to do when dynamic shared memory control segment is corrupt

From: Sherrylyn Branchaw <sbranchaw(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pg(at)bowt(dot)ie, Andres Freund <andres(at)anarazel(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: What to do when dynamic shared memory control segment is corrupt
Date: 2018-06-19 15:43:25
Message-ID: CAB_myF4bWKaqSf7XGBdu=9tk=fH8AF+ZFTrcMhj2W--NKinB1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Yeah, I'd like to know that too. The complaint about corrupt shared
memory may be just an unrelated red herring, or it might be a separate
effect of whatever the primary failure was ... but I think it was likely
not the direct cause of the failure-to-restart.

Anyway, I would not be afraid to try restarting the postmaster manually
if it died. Maybe don't do that repeatedly without human intervention;
but PG is pretty robust against crashes. We developers crash it all the
time, and we don't lose data.

Understood, and thanks. I was basing my concern on a message in the mailing
lists that suggested that postgres might fail to start up in the event of a
corrupted memory segment. I would link to the message directly, but I keep
getting backend server error messages when I try to search for it today. At
any rate, it looked there was a chance that it was a deliberate design
choice, and I didn't want to ignore it if so. It's good to know that this
is not the case.

I realize that you're most focused on less-downtime, but from my
perspective it'd be good to worry about collecting evidence as to
what happened exactly.

Absolutely. I would love to know why this is happening too. However, our
priorities have been set in part by a very tight deadline handed down from
the C-levels to migrate to Aurora, so we have to focus our energies
accordingly. I will be back with core files if this happens again before
we're completely migrated over. Meanwhile, thank you for assuring me we
have no current data corruption and that it's safe to restart next time
without taking additional action to avoid or detect corruption.

Best,
Sherrylyn

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2018-06-19 15:53:29 Re: found xmin from before relfrozenxid on pg_catalog.pg_authid
Previous Message Fabio Pardi 2018-06-19 15:38:28 Re: Drop Default Privileges?