Re: [PING] fallocate() causes btrfs to never compress postgresql files

From: Dimitrios Apostolou <jimis(at)gmx(dot)net>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Subject: Re: [PING] fallocate() causes btrfs to never compress postgresql files
Date: 2025-05-29 15:57:40
Message-ID: 4aa3d83d-9630-61ac-85a7-a55490be49a6@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 28 May 2025, Tomas Vondra wrote:
>
> Isn't guaranteeing success of a write a general issue with compressed
> filesystem? Why is posix_fallocate() any special in this regard?
> Shouldn't the filesystem be defensive and assume the data is not
> compressible? Or maybe just return EOPNOTSUPP when in doubt.

It's not simple for CoW filesystems, including Btrfs and ZFS. What I know
is that the current design is a compromise, it's not that the developers
are happy with it. I can point you to some discussion, with pointers to
further discussions if you are interested:

https://marc.info/?l=linux-btrfs&m=174310663519516&w=2

>> BTW even in the last case, PostgreSQL would not notice the lack of
>> fallocate() support as glibc implements a userspace fallback in
>> posix_fallocate(). That fallback has its own issues that hopefully will
>> not affect postgres (see CAVEATS in man 3 posix_fallocate).
>>
>
> Well, if btrfs starts returning EOPNOTSUPP, and glibc switches to the
> userspace fallback, we wouldn't notice. But that's up to the btrfs to
> decide if they want to support fallocate. We still need our fallback
> anyway, because of other OSes.

Btrfs has decided a few years back: they will "support" fallocate, but
because real support is very difficult, they disable compression (among
others) for files with fallocate'd ranges. They can't change that and
return EOPNOTSUPP out of the blue now, but they are open to adding a mount
option to optionally do that:

https://marc.info/?l=linux-btrfs&m=174310663519516&w=2

>> Should PostgreSQL provide a setting to avoid the use of fallocate()? Or is
>> it the filesystem at fault for not returning EOPNOTSUPP, in which case
>> postgres would use its fallback code?
>>
>
> I don't have a clear opinion on whether it's a filesystem issue. Maybe
> we should be handling this differently, not sure.

All I'm saying is that this is a regression for PostgreSQL users that keep
tablespaces on compressed Btrfs. What could be done from postgres, is to
provide a runtime setting for avoiding fallocate(), going instead through
the old code path. Idelly this would be an option per tablespace, but even
a global one is better than nothing.

Thanks,
Dimitris

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2025-05-29 16:41:17 Re: PG 18 release notes draft committed
Previous Message Peter Geoghegan 2025-05-29 15:37:04 Re: Correcting freeze conflict horizon calculation