Re: pg_serial bloat

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_serial bloat
Date: 2023-12-21 22:05:14
Message-ID: CA+hUKGK8nkZTd4v2=OQi1_jKB9O+Cgny6=1oSKV2GTSZ=3O8KQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 15, 2023 at 9:53 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> ... We've seen a system with ~30GB of files in there
> (note: full/untruncated be would be 2³² xids × sizeof(uint64_t) =
> 32GB). It's not just a gradual disk space leak: according to disk
> space monitoring, this system suddenly wrote ~half of that data, which
> I think must be the while loop in SerialAdd() zeroing out pages.

Attempt at an analysis of this rare anti-social I/O pattern:

SerialAdd() writes zero pages in a range from the old headPage up to
some target page, but headPage can be any number, arbitrarily far in
the past (or apparently, the future). It only keeps up with the
progress of the xid clock and spreads that work out if we happen to
call SerialAdd() often enough. If we call SerialAdd() only every
couple of billion xids (eg very occasionally you leave a transaction
open and go out to lunch on a very busy system using SERIALIZABLE
everywhere), you might find yourself suddenly needing to write out
many gigabytes of zeroes there.

One observation is that headPage gets periodically zapped to -1 by
checkpoints, near the comment "SLRU is no longer needed", providing a
periodic dice-roll that chops the range down. Unfortunately the
historical "apparent wraparound" bug prevents that from being reached.
That bug was fixed by commit d6b0c2b (master only, no back-patch). On
the system where we saw pg_serial going bananas, that message appeared
regularly.

Attempts to find a solution:

I think it might make sense to clamp firstZeroPage into the page range
implied by tailXid, headXid. Those values are eagerly maintained and
interlock with snapshots and global xmin (correctly but
under-documented-ly, AFAICS so far), and we will never try to look up
the CSN for any xid outside that range. I think that should exclude
the pathological zero-writing cases. I wouldn't want to do this
without a working reproducer though, which will take some effort.

Another thought is that in the glorious 64 bit future, we might be
able to invent a "sparse" SLRU, where if the file or page doesn't
exist, we just return a zero CSN, and when we write a new page we just
let the OS provide filesystem holes as required. The reason I
wouldn't want to invent sparse SLRUs with 32 bit indexing is that we
have no confidence in the truncation logic, which might leave stray
files from earlier epochs. So I think we need zero'd pages (or
perhaps at least to confirm that there is nothing already there, but I
have zero desire to make the current wraparound-ridden system more
complex).

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2023-12-21 22:24:01 Re: Built-in CTYPE provider
Previous Message Melanie Plageman 2023-12-21 21:36:12 Re: Emit fewer vacuum records by reaping removable tuples during pruning