From: | Dimitrios Apostolou <jimis(at)gmx(dot)net> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
Subject: | Re: [PING] fallocate() causes btrfs to never compress postgresql files |
Date: | 2025-05-29 15:57:40 |
Message-ID: | 4aa3d83d-9630-61ac-85a7-a55490be49a6@gmx.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 28 May 2025, Tomas Vondra wrote:
>
> Isn't guaranteeing success of a write a general issue with compressed
> filesystem? Why is posix_fallocate() any special in this regard?
> Shouldn't the filesystem be defensive and assume the data is not
> compressible? Or maybe just return EOPNOTSUPP when in doubt.
It's not simple for CoW filesystems, including Btrfs and ZFS. What I know
is that the current design is a compromise, it's not that the developers
are happy with it. I can point you to some discussion, with pointers to
further discussions if you are interested:
https://marc.info/?l=linux-btrfs&m=174310663519516&w=2
>> BTW even in the last case, PostgreSQL would not notice the lack of
>> fallocate() support as glibc implements a userspace fallback in
>> posix_fallocate(). That fallback has its own issues that hopefully will
>> not affect postgres (see CAVEATS in man 3 posix_fallocate).
>>
>
> Well, if btrfs starts returning EOPNOTSUPP, and glibc switches to the
> userspace fallback, we wouldn't notice. But that's up to the btrfs to
> decide if they want to support fallocate. We still need our fallback
> anyway, because of other OSes.
Btrfs has decided a few years back: they will "support" fallocate, but
because real support is very difficult, they disable compression (among
others) for files with fallocate'd ranges. They can't change that and
return EOPNOTSUPP out of the blue now, but they are open to adding a mount
option to optionally do that:
https://marc.info/?l=linux-btrfs&m=174310663519516&w=2
>> Should PostgreSQL provide a setting to avoid the use of fallocate()? Or is
>> it the filesystem at fault for not returning EOPNOTSUPP, in which case
>> postgres would use its fallback code?
>>
>
> I don't have a clear opinion on whether it's a filesystem issue. Maybe
> we should be handling this differently, not sure.
All I'm saying is that this is a regression for PostgreSQL users that keep
tablespaces on compressed Btrfs. What could be done from postgres, is to
provide a runtime setting for avoiding fallocate(), going instead through
the old code path. Idelly this would be an option per tablespace, but even
a global one is better than nothing.
Thanks,
Dimitris
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2025-05-29 16:41:17 | Re: PG 18 release notes draft committed |
Previous Message | Peter Geoghegan | 2025-05-29 15:37:04 | Re: Correcting freeze conflict horizon calculation |