Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Date: 2022-03-01 21:46:40
Message-ID: CA+TgmoZv6q3wxxUm=R+5gB4_FmP27pjnGsWRdEJekC0uQniXcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 20, 2022 at 3:27 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > I think that the idea has potential, but I don't think that I
> > understand yet what the *exact* algorithm is.
>
> The algorithm seems to exploit a natural tendency that Andres once
> described in a blog post about his snapshot scalability work [1]. To a
> surprising extent, we can usefully bucket all tuples/pages into two
> simple categories:
>
> 1. Very, very old ("infinitely old" for all practical purposes).
>
> 2. Very very new.
>
> There doesn't seem to be much need for a third "in-between" category
> in practice. This seems to be at least approximately true all of the
> time.
>
> Perhaps Andres wouldn't agree with this very general statement -- he
> actually said something more specific. I for one believe that the
> point he made generalizes surprisingly well, though. I have my own
> theories about why this appears to be true. (Executive summary: power
> laws are weird, and it seems as if the sparsity-of-effects principle
> makes it easy to bucket things at the highest level, in a way that
> generalizes well across disparate workloads.)

I think that this is not really a description of an algorithm -- and I
think that it is far from clear that the third "in-between" category
does not need to exist.

> Remember when I got excited about how my big TPC-C benchmark run
> showed a predictable, tick/tock style pattern across VACUUM operations
> against the order and order lines table [2]? It seemed very
> significant to me that the OldestXmin of VACUUM operation n
> consistently went on to become the new relfrozenxid for the same table
> in VACUUM operation n + 1. It wasn't exactly the same XID, but very
> close to it (within the range of noise). This pattern was clearly
> present, even though VACUUM operation n + 1 might happen as long as 4
> or 5 hours after VACUUM operation n (this was a big table).

I think findings like this are very unconvincing. TPC-C (or any
benchmark really) is so simple as to be a terrible proxy for what
vacuuming is going to look like on real-world systems. Like, it's nice
that it works, and it shows that something's working, but it doesn't
demonstrate that the patch is making the right trade-offs overall.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-03-01 21:50:08 Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Previous Message Andrew Dunstan 2022-03-01 21:41:38 Re: SQL/JSON: functions