Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"

From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
Date: 2021-06-22 17:00:04
Message-ID: 11523fe8-7614-9d57-1ad5-c12a4c4ec9cf@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,
22.06.2021 16:00, Thomas Munro wrote:
> On Tue, Jun 22, 2021 at 9:30 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> Hmm, the simplest explanation would be that the read() or write() on the
>> relmapper file is not atomic. We assume that it is, and don't use a lock
>> in load_relmap_file() because of that. Is there anything unusual about
>> the filesystem, mount options or the kernel you're using? I could not
>> reproduce this on my laptop. Does the attached patch fix it for you?
> I have managed to reproduce this twice on a laptop running Linux
> 5.10.0-2-amd64, after trying many things for several hours. Both
> times I was using ext4 in a loopback file (underlying is xfs, I had no
> luck there hence hunch that I should try ext4, may not be significant
> though) with fsync=off (ditto).
I'm sorry, I forgot that I've set "fsync=off" in my postgresql.conf (to
avoid NVME-specific slowdown on fsyncs).
It really does matter. With fsync=on the demo script passes 20
iterations successfully.
I reproduce the issue on Ubuntu 20.04 with the kernel 5.9.15, ext4
(without any specific options) on NVME storage, and Ryzen 3700x.
It was first encountered on Debian 10 with the kernel 4.19.0, ext4 on
software RAID built on NVME storage too, and Xeon 5220.

The attached patch fixes it for me (with fsync=off). 3 runs by 20
iterations completed without the error (without the patch I get the
error on the first iteration).

Best regards,
Alexander

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2021-06-23 00:20:31 Re: BUG #17062: Assert failed in RemoveRoleFromObjectPolicy() on DROP OWNED policy applied to duplicate role
Previous Message Alexander Korotkov 2021-06-22 16:02:34 Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows