Re: Disabling Heap-Only Tuples

From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Disabling Heap-Only Tuples
Date: 2023-09-18 20:02:04
Message-ID: 069380be10ec2aad5fa951769c8c13212c0fafd0.camel@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2023-09-18 at 12:22 -0400, Robert Haas wrote:
> On Tue, Sep 5, 2023 at 11:15 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
> > I don't think that is a good comparison.  While most people probably
> > never need to touch "local_update_limit", "work_mem" is something everybody
> > has to consider.
> >
> > And it is not so hard to tune: the setting would be the desired table
> > size, and you could use pgstattuple to find a good value.
>
> What I suspect would happen, though, is that you'd end up tuning the
> value over and over. You'd set it to some value and after some number
> of vacuums maybe you'd realize that you could save even more disk
> space if you reduced it a bit further or maybe your data set would
> grow a bit and you'd have to increase it a little (or a lot). And if
> you didn't keep adjusting it then maybe something quite bad would
> happen to your database.

There is that risk, yes.

> work_mem isn't quite the same [...] But what is the same here and in the case
> of work_mem is that you can suddenly get hosed if the situation
> changes substantially and you don't respond by updating the parameter
> setting. In the case of work_mem, again in my experience, it's quite
> common for people to suddenly find themselves in a lot of trouble if
> they have a load spike, because now they're running a lot more copies
> of the same query and the machine runs out of memory.

So the common ground is "both parameters are not so easy to get right,
and if you get them wrong, it's a problem". For me the big difference is
that while you pretty much have to tune "work_mem", you can normally just ignore
"local_update_limit".

> The equivalent
> problem here would be if the table suddenly gets a lot bigger due to a
> load spike or some change in the way the application is used. Then
> suddenly, a setting that was previously serving to keep the table
> pleasantly small and un-bloated on disk is instead causing tons of
> updates that would have been HOT to become non-HOT, which could very
> easily result in both the table and its indexes bloating quite
> rapidly. I really don't like the idea of an anti-bloat feature that,
> when set to the wrong value, becomes a bloat-amplification feature. I
> don't know how to describe that other than "fragile and dangerous."

Yes, you can hurt yourself that way. But that applies to many other
settings as well. You can tank your performance with a bad value for
"commit_delay", "hot_standby_feedback" can bloat your primary, and
so on. Still we consider these useful parameters.

> Imagine a hypothetical feature that knew how small the table could
> reasonably be kept, say by magic, and did non-HOT updates instead of
> HOT updates whenever doing so would allow moving a tuple from a page
> beyond that magical boundary to an earlier page. Such a feature would
> not have the downsides that this one does -- if there were
> opportunities to make the table smaller, the system would take
> advantage of them automatically, and if the table grew, the system
> would automatically become more relaxed to stay out of trouble. Such a
> feature is clearly more work to design and implement than what is
> proposed here, but it would also work a lot better in practice.

That sounds a bit like we should not have "shared_buffers" unless we
have a magical tool built in that gets the value right automatically.
Yes, the better is the enemy of the good. You can kill everything with
a line of reasoning like that.

> In
> fact, I daresay that if we accept the feature as proposed, somebody's
> going to go out and write a tool to calculate what the threshold ought
> to be and automatically adjust it as things change. Users of the tool
> will then divide into two camps:
>
> - People who try to tune it manually and get burned if anything
> changes on their system.
> - People who use that out-of-core tool.
>
> So the out-of-core tool that does this tuning becomes a stealth
> dependency for any user who is facing this problem. Gosh, don't we
> have enough of those already? Connection pooling being perhaps the
> most obvious example, but far from the only one.

I cannot follow you there. What I envision is that "local_update_limit"
is not set permanently on a table. You set it when you realize your table
got bloated. Then you wait until the bloat goes away or you launch a
couple of UPDATEs that eventually shrink the table. Then you reset
"local_update_limit" again.
It's a more difficult, but less invasive alternative to VACUUM (FULL).

If a setting is hard to understand and hard to get right, we could invest
in good documentation that explains the use cases and pitfalls.
Wouldn't that go a long way towards defusing this perceived footgun?
I am aware that a frightening number of users don't read documentation,
but I find it hard to believe that anyone would twiddle a non-obvious
knob like "local_update_limit" without first trying to figure out what
it actually does.

Yours,
Laurenz Albe

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-09-18 20:35:11 Re: Use of additional index columns in rows filtering
Previous Message Robert Haas 2023-09-18 18:57:54 Re: New WAL record to detect the checkpoint redo location