Re: Disabling Heap-Only Tuples

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Disabling Heap-Only Tuples
Date: 2023-09-18 16:22:26
Message-ID: CA+TgmobCwFL94gxKi6jofxmfpggjJEvwsnaH4wjKJWMCTC3axA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 5, 2023 at 11:15 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
> I don't think that is a good comparison. While most people probably
> never need to touch "local_update_limit", "work_mem" is something everybody
> has to consider.
>
> And it is not so hard to tune: the setting would be the desired table
> size, and you could use pgstattuple to find a good value.

What I suspect would happen, though, is that you'd end up tuning the
value over and over. You'd set it to some value and after some number
of vacuums maybe you'd realize that you could save even more disk
space if you reduced it a bit further or maybe your data set would
grow a bit and you'd have to increase it a little (or a lot). And if
you didn't keep adjusting it then maybe something quite bad would
happen to your database.

work_mem isn't quite the same in the sense that most people don't need
to keep on iteratively tuning work_mem, at least not in my experience.
You figure out a value that works OK in practice and then leave it
alone. The problem is mostly discovering what that initial value ought
to be, which is often hard. But what is the same here and in the case
of work_mem is that you can suddenly get hosed if the situation
changes substantially and you don't respond by updating the parameter
setting. In the case of work_mem, again in my experience, it's quite
common for people to suddenly find themselves in a lot of trouble if
they have a load spike, because now they're running a lot more copies
of the same query and the machine runs out of memory. The equivalent
problem here would be if the table suddenly gets a lot bigger due to a
load spike or some change in the way the application is used. Then
suddenly, a setting that was previously serving to keep the table
pleasantly small and un-bloated on disk is instead causing tons of
updates that would have been HOT to become non-HOT, which could very
easily result in both the table and its indexes bloating quite
rapidly. I really don't like the idea of an anti-bloat feature that,
when set to the wrong value, becomes a bloat-amplification feature. I
don't know how to describe that other than "fragile and dangerous."

Imagine a hypothetical feature that knew how small the table could
reasonably be kept, say by magic, and did non-HOT updates instead of
HOT updates whenever doing so would allow moving a tuple from a page
beyond that magical boundary to an earlier page. Such a feature would
not have the downsides that this one does -- if there were
opportunities to make the table smaller, the system would take
advantage of them automatically, and if the table grew, the system
would automatically become more relaxed to stay out of trouble. Such a
feature is clearly more work to design and implement than what is
proposed here, but it would also work a lot better in practice. In
fact, I daresay that if we accept the feature as proposed, somebody's
going to go out and write a tool to calculate what the threshold ought
to be and automatically adjust it as things change. Users of the tool
will then divide into two camps:

- People who try to tune it manually and get burned if anything
changes on their system.
- People who use that out-of-core tool.

So the out-of-core tool that does this tuning becomes a stealth
dependency for any user who is facing this problem. Gosh, don't we
have enough of those already? Connection pooling being perhaps the
most obvious example, but far from the only one.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2023-09-18 16:55:00 Questioning an errcode and message in jsonb.c
Previous Message Robert Haas 2023-09-18 16:01:19 Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }