From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Dimitrios Apostolou <jimis(at)gmx(dot)net> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
Subject: | Re: [PING] fallocate() causes btrfs to never compress postgresql files |
Date: | 2025-05-31 14:33:27 |
Message-ID: | 4453f831-bcbe-49e2-88ed-747f0abbdebb@vondra.me |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 5/31/25 16:00, Thomas Munro wrote:
> On Fri, May 30, 2025 at 3:58 AM Dimitrios Apostolou <jimis(at)gmx(dot)net> wrote:
>> All I'm saying is that this is a regression for PostgreSQL users that keep
>> tablespaces on compressed Btrfs. What could be done from postgres, is to
>> provide a runtime setting for avoiding fallocate(), going instead through
>> the old code path. Idelly this would be an option per tablespace, but even
>> a global one is better than nothing.
>
> Here's an initial sketch of such a setting. Better name, design,
> words welcome. Would need a bit more work to cover temp tables too.
> It's slightly tricky to get smgr to behave differently because of the
> contents of a system catalogue! I couldn't think of a better way than
> exposing it as a flag that the buffer manager layer has to know about
> and compute earlier, but that also seems a bit strange, as fallocate
> is a highly md.c specific concern. Hmm.
>
I find the definition of io_min_fallocate confusing, or rather that 0
means "never" instead of "always". It's described as a "threshold at
which to start using fallocate", so I'd expect 0 to mean "always"
because (len >= 0).
I suggest to use "-1" to mean never and "0" always, as for other similar
settings (e.g. log_min_duration_statement or log_lock_waits).
> I suppose something like the 0001 part could be back-patched if this
> is considered a serious enough problem without other workarounds, so I
> did this in two steps. I wonder if there are good reasons to want to
> change the number on other file systems. I suppose it at least allows
> experimentation.
Maybe. It'd need to get some of the 0002 bits too, ofc.
I'm not sure we really want all these special GUC tailored for different
filesystems. We already have a few such GUCs, it's getting tricky to
know which ones to set / not set, and it also changes with the
filesystem version ... I personally don't know which ones to set, a lot
of the knowledge is somewhat outdated I think.
Wouldn't it be better for btrfs to just start returning EOPNOTSUPP
(maybe with a mount option), in which case we already do the right thing
automatically already? Sure, it means the admin needs to be aware of
this in both cases.
regards
--
Tomas Vondra
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-05-31 15:42:01 | Re: [PING] fallocate() causes btrfs to never compress postgresql files |
Previous Message | Thomas Munro | 2025-05-31 14:32:45 | Re: [PING] fallocate() causes btrfs to never compress postgresql files |