Re: FSM corruption leading to errors

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FSM corruption leading to errors
Date: 2016-10-17 07:14:42
Message-ID: CABOikdNXbhebE5JHAN8k4ZCJ6_DR6oOUfz-MavYCZfbP9tdpwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 11, 2016 at 5:20 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

>
> >
> > Once the underlying bug is fixed, I don't see why it should break again.
> I
> > added the above code to mostly deal with already corrupt FSMs. May be we
> can
> > just document and leave it to the user to run some correctness checks
> (see
> > below), especially given that the code is not cheap and adds overheads
> for
> > everybody, irrespective of whether they have or will ever have corrupt
> FSM.
>
> Yep. I'd leave it for the release notes to hold a diagnostic method.
> That's annoying, but this has been done in the past like for the
> multixact issues..
>

I'm okay with that. It's annoying, especially because the bug may show up
when your primary is down and you just failed over for HA, only to find
that the standby won't work correctly. But I don't have ideas how to fix
existing corruption without adding significant penalty to normal path.

>
> What if you restart the standby, and then do a diagnostic query?
> Wouldn't that be enough? (Something just based on
> pg_freespace(pg_relation_size(oid) / block_size) != 0)
>
>
Yes, that will enough once the fix is in place.

I think this is a major bug and I would appreciate any ideas to get the
patch in a committable shape before the next minor release goes out. We
probably need a committer to get interested in this to make progress.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2016-10-17 07:16:27 Re: postgres_fdw super user checks
Previous Message Masahiko Sawada 2016-10-17 07:00:06 Re: Quorum commit for multiple synchronous replication.