Re: Disabling Heap-Only Tuples

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Thom Brown <thom(at)linux(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Disabling Heap-Only Tuples
Date: 2023-09-19 18:20:06
Message-ID: CAEze2WhmE=Vv-deUS83S9qcdOz-Z4zOsk7HrL4GBNM=a+TO3jw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 19 Sept 2023 at 18:52, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, Sep 19, 2023 at 12:30 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > I was thinking something vaguely like "a table size that's roughly what
> > an optimal autovacuuming schedule would leave the table at" assuming 0.2
> > vacuum_scale_factor. You would determine the absolute minimum size for
> > the table given the current live tuples in the table, then add 20% to
> > account for a steady state of dead tuples and vacuumed space. So it's
> > not 1.2x of the "current" table size at the time the local_update_limit
> > feature is installed, but 1.2x of the optimal table size.
>
> Right, that would be great. And honestly if that's something we can
> figure out, then why does the parameter even need to be an integer
> instead of a Boolean? If the system knows the optimal table size, then
> the user can just say "try to compact this table" and need not say to
> what size. The 1.2 multiplier is probably situation dependent and
> maybe the multiplier should indeed be a configuration parameter, but
> we would be way better off if the absolute size didn't need to be.

Mostly agreed, but I think there's a pitfall here. You seem to assume
we have a perfect oracle that knows the optimal data size, but we
already know that our estimates can be significantly off. I don't
quite trust the statistics enough to do any calculations based on the
number of tuples in the relation. That also ignores the fact that we
don't actually have any good information about the average size of the
tuples in the table. So with current statistics, any automated "this
is how large the table should be" decisions would result in an
automated footgun, instead of the current patch's where the user has
to decide to configure it to an explicit value.

But about that: I'm not sure what the "footgun" is that you've
mentioned recently?
The issue with excessive bloat (when the local_update_limit is set too
small and fillfactor is low) was fixed in the latest patch nearly
three weeks ago, so the only remaining issue with misconfiguration is
slower updates. Sure, that's not great, but in my opinion not a
"footgun": performance returns immediately after resetting
local_update_limit, and no space was lost.

> > This makes me think that maybe the logic needs to be a little more
> > complex to avoid the problem you describe: if an UPDATE is prevented
> > from being HOT because of this setting, but then it goes and consults
> > FSM and it gives the update a higher block number than the tuple's
> > current block (or it fails to give a block number at all so it is forced
> > to extend the relation), then the update should give up on that strategy
> > and use a HOT update after all. (I have not read the actual patch;
> > maybe it already does this? It sounds kinda obvious.)
>
> +1 to all of that. Anything we can do to reduce the chance of the
> parameter doing the opposite of what it's intended to do is, IMHO,
> really, really valuable. If you're in the situation where you really
> need something like this, you're probably having a pretty bad day
> already.

Yes, it does that with the latest patch, from not quite 3 weeks ago.

> Just to be more clear about my position, I don't think that having
> some kind of a feature along these lines is a bad idea.

Thanks for clarifying.

> I do think
> that this is one of those cases where the perfect is the enemy of the
> good, and we can fall into the trap of saying that since we can't do
> the perfect thing let's not do anything at all. At the same time, just
> because we need to do something doesn't mean we should do exactly the
> first thing that anybody thought up, or that we shouldn't try as hard
> as we can to mitigate the downsides. If we add something like this I
> bet it will get a lot of use. Even a minor improvement to the design
> that removes one pitfall of many could turn out to help a lot of
> people.

100% agreed.

> > Having to set AEL is not nice for sure, but wouldn't
> > ShareUpdateExclusiveLock be sufficient? We have a bunch of reloptions
> > for which that is sufficient.
>
> Hmm, yeah, I think you're right.

Updating the reloption after relation truncation implies having the
same lock as relation truncation, i.e. AEL (if the vacuum docs are to
be believed). So the AEL is not reqiored for updating the storage
option (that would require SUEL), but for the block truncation
operation operation.

Kind regards,

Matthias van de Meent
Neon (http://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-09-19 18:50:13 Re: Disabling Heap-Only Tuples
Previous Message Heikki Linnakangas 2023-09-19 18:05:41 Re: GenBKI emits useless open;close for catalogs without rows