From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Seek failure at end of FSM file during WAL replay (in 11) |
Date: | 2019-07-24 17:30:42 |
Message-ID: | 31570.1563989442@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Michael Paquier <michael(at)paquier(dot)xyz> writes:
> Recently, one of the test beds we use has blown up once when doing
> streaming replication like that:
> FATAL: could not seek to end of file "base/16386/19817_fsm": No such
> file or directory
> CONTEXT: WAL redo at 60/8DA22448 for Heap2/CLEAN: remxid 65751197
> LOG: startup process (PID 44886) exited with exit code 1
> All the WAL records have been wiped out since, so I don't know exactly
> what happened, but I could track down that this FSM file got removed
> a couple of hours before as I got my hands on some FS-level logs which
> showed a deletion.
Hm. AFAICS the immediate issuer of the error must have been
_mdnblocks(); there are other matches to that error string but
they are in places where we can tell which file the seek must
have been applied to, and it wasn't a FSM file.
> Before blaming a lower level of
> the application stack, I am wondering if we have some issues with
> mdfd_vfd meaning that the file has been removed but that it is still
> tracked as opened.
lseek() per se presumably would never return ENOENT. A more likely
theory is that the file wasn't actually open but only had a leftover
VFD entry, and when FileSize() -> FileAccess() tried to open it,
the open failed with ENOENT --- but _mdnblocks() would still call it
a seek failure.
So I'd opine that this is a pretty high-level failure --- what are
we doing trying to replay WAL against a table that's been dropped?
Or if it wasn't dropped, why was the FSM removed?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2019-07-24 17:34:24 | Re: POC: Cleaning up orphaned files using undo logs |
Previous Message | Heikki Linnakangas | 2019-07-24 17:30:09 | Re: GiST VACUUM |