Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date: 2022-02-04 21:18:47
Message-ID: CA+TgmoZQtdd7Ob0HmwtURF=4TVNGWS0POTG5oz6rvi+EobyWuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 4, 2022 at 3:31 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Application B will already block pruning by VACUUM operations against
> application A's table, and so effectively blocks recording of the
> resultant free space in the FSM in your scenario. And so application A
> and application B should be considered the same application already.
> That's just how VACUUM works.

Sure ... but that also sucks. If we consider application A and
application B to be the same application, then we're basing our
decision about what to do on information that is inaccurate.

> 5% larger seems like a lot more than would be typical, based on what
> I've seen. I don't think that the regression in this scenario can be
> characterized as "infinitely worse", or anything like it. On a long
> enough timeline, the potential upside of something like this is nearly
> unlimited -- it could avoid a huge amount of write amplification. But
> the potential downside seems to be small and fixed -- which is the
> point (bounding the downside). The mere possibility of getting that
> big benefit (avoiding the costs from heap fragmentation) is itself a
> benefit, even when it turns out not to pay off in your particular
> case. It can be seen as insurance.

I don't see it that way. There are cases where avoiding writes is
better, and cases where trying to cram everything into the fewest
possible ages is better. With the right test case you can make either
strategy look superior. What I think your test case has going for it
is that it is similar to something that a lot of people, really a ton
of people, actually do with PostgreSQL. However, it's not going to be
an accurate model of what everybody does, and therein lies some
element of danger.

> Then what could you have confidence in?

Real-world experience. Which is hard to get if we don't ever commit
any patches, but a good argument for (a) having them tested by
multiple different hackers who invent test cases independently and (b)
some configurability where we can reasonably include it, so that if
anyone does experience problems they have an escape.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-02-04 21:23:43 Re: Windows now has fdatasync()
Previous Message Peter Geoghegan 2022-02-04 21:11:54 Re: should vacuum's first heap pass be read-only?