Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
Date: 2021-06-22 14:11:06
Message-ID: 1715251.1624371066@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> Your analysis seems right to me. We have to worry about both things:
> atomicity of writes on power failure (assumed to be sector-level,
> hence our 512 byte struct -- all good), and atomicity of concurrent
> reads and writes (we can't assume anything at all, so r/w locking is
> the simplest way to get a consistent read). Shouldn't relmap_redo()
> also acquire the lock exclusively?

Shouldn't we instead file a kernel bug report? I seem to recall that
POSIX guarantees atomicity of these things up to some operation size.
Or is that just for pipe I/O?

If we can't assume atomicity of relmapper file I/O, I wonder about
pg_control as well. But on the whole, what I'm smelling is a moderately
recently introduced kernel bug. We've been doing this this way for
years and heard no previous reports.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2021-06-22 14:19:23 Re: BUG #17068: Incorrect ordering of a particular row.
Previous Message Tom Lane 2021-06-22 14:00:22 Re: BUG #17068: Incorrect ordering of a particular row.