Re: FSM corruption leading to errors

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FSM corruption leading to errors
Date: 2016-10-21 03:15:43
Message-ID: CAB7nPqSgxvGnxNmeUwMPuD-8irwMZ6T+OtOqwTCPnUTNnHvKMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 20, 2016 at 3:37 PM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
> Just to clarify, I meant if we truncate the entire FSM then we'll need API
> to truncate VM as well so that VACUUM rebuilds everything completely. OTOH
> if we provide a function just to truncate FSM to match the size of the
> table, then we don't need to rebuild the FSM. So that's probably a better
> way to handle FSM corruption, as far as this particular issue is concerned.

To be honest, I think that just having in the release notes the method
that does not involve the use any extra extension or SQL routine is
fine. So we could just tell to users to:
1) Run something like the query you gave upthread, giving to the user
a list of the files that are corrupted. And add this query to the
release notes.
2) If anything is found, stop the server and delete the files manually.
3) Re-start the server.
OK, that's troublesome and costly for large relations, but we know
that's the safest way to go for any versions, and there is no need to
complicate the code with any one-time repairing extensions.

Speaking of which, I implemented a small extension able to truncate
the FSM up to the size of the relation as attached, but as I looked at
it SMGR_TRUNCATE_FSM has been introduced in 9.6 so its range of action
is rather limited... And I pushed as well a version on github:
https://github.com/michaelpq/pg_plugins/tree/master/pg_fix_truncation
The limitation range of such an extension is a argument good enough to
just rely on the stop/delete-FSM/start method to fix an instance and
let VACUUM do the rest of the work. That looks to work but use it at
your own risk.

This bug would be a good blog topic by the way...
--
Michael

Attachment Content-Type Size
pg_fix_truncation.tar.gz application/x-gzip 1.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-10-21 04:50:43 Re: Remove autovacuum GUC?
Previous Message Dilip Kumar 2016-10-21 02:44:05 Re: Speed up Clog Access by increasing CLOG buffers