Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
Date: 2021-06-22 13:00:31
Message-ID: CA+hUKGJ98=MOjDCnMAC5gSpkzrrey=O+aEQJ1OY03C=cVtiwkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Jun 22, 2021 at 9:30 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> Hmm, the simplest explanation would be that the read() or write() on the
> relmapper file is not atomic. We assume that it is, and don't use a lock
> in load_relmap_file() because of that. Is there anything unusual about
> the filesystem, mount options or the kernel you're using? I could not
> reproduce this on my laptop. Does the attached patch fix it for you?

I have managed to reproduce this twice on a laptop running Linux
5.10.0-2-amd64, after trying many things for several hours. Both
times I was using ext4 in a loopback file (underlying is xfs, I had no
luck there hence hunch that I should try ext4, may not be significant
though) with fsync=off (ditto).

> If that's the cause, it is easy to fix by taking the RelationMappingLock
> in load_relmap_file(), like in the attached patch. But if the write is
> not atomic, you might have a bigger problem: we also rely on the
> atomicity when writing the pg_control file. If that becomes corrupt
> because of a partial write, the server won't start up. If it's just a
> race condition between the read/write, or only the read() is not atomic,
> maybe pg_control is OK, but I'd like to understand the issue better
> before just adding a lock to load_relmap_file().

Your analysis seems right to me. We have to worry about both things:
atomicity of writes on power failure (assumed to be sector-level,
hence our 512 byte struct -- all good), and atomicity of concurrent
reads and writes (we can't assume anything at all, so r/w locking is
the simplest way to get a consistent read). Shouldn't relmap_redo()
also acquire the lock exclusively?

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2021-06-22 13:06:44 Re: BUG #17068: Incorrect ordering of a particular row.
Previous Message Thomas Munro 2021-06-22 11:17:15 Re: Unicode FFFF Special Codepoint should always collate high.