Re: Seek failure at end of FSM file during WAL replay (in 11)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Seek failure at end of FSM file during WAL replay (in 11)
Date: 2019-07-24 17:30:42
Message-ID: 31570.1563989442@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Michael Paquier <michael(at)paquier(dot)xyz> writes:
> Recently, one of the test beds we use has blown up once when doing
> streaming replication like that:
> FATAL: could not seek to end of file "base/16386/19817_fsm": No such
> file or directory
> CONTEXT: WAL redo at 60/8DA22448 for Heap2/CLEAN: remxid 65751197
> LOG: startup process (PID 44886) exited with exit code 1

> All the WAL records have been wiped out since, so I don't know exactly
> what happened, but I could track down that this FSM file got removed
> a couple of hours before as I got my hands on some FS-level logs which
> showed a deletion.

Hm. AFAICS the immediate issuer of the error must have been
_mdnblocks(); there are other matches to that error string but
they are in places where we can tell which file the seek must
have been applied to, and it wasn't a FSM file.

> Before blaming a lower level of
> the application stack, I am wondering if we have some issues with
> mdfd_vfd meaning that the file has been removed but that it is still
> tracked as opened.

lseek() per se presumably would never return ENOENT. A more likely
theory is that the file wasn't actually open but only had a leftover
VFD entry, and when FileSize() -> FileAccess() tried to open it,
the open failed with ENOENT --- but _mdnblocks() would still call it
a seek failure.

So I'd opine that this is a pretty high-level failure --- what are
we doing trying to replay WAL against a table that's been dropped?
Or if it wasn't dropped, why was the FSM removed?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2019-07-24 17:34:24 Re: POC: Cleaning up orphaned files using undo logs
Previous Message Heikki Linnakangas 2019-07-24 17:30:09 Re: GiST VACUUM