Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
Date: 2022-09-22 16:51:01
Message-ID: CAH2-WzmLCn2Hx9tQLdmdb+9CkHKLyWD2bsz=PmRebc4dAxjy6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 21, 2022 at 9:21 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> I wouldn't mind giving this a try.

Definitely seems worth experimenting with. Many WAL records generated
during VACUUM (and during opportunistic pruning/index tuple deletion)
have offset number arrays that can be assumed to be both sorted and
unique. My guess is that these WAL record types are the most amenable
to compression at the WAL record level.

> Yeah, it seems likely that we could pack offsets in single bytes in many
> cases. A more sophisticated approach could even choose how many bits to
> use per offset based on the maximum in the array. Furthermore, we might be
> able to make use of SIMD instructions to mitigate any performance penalty.

I guess I'd start with some kind of differential compression that
relies on the arrays being both sorted and unique. While it might be
important to be able to compose together multiple different techniques
(something that is more than the sum of its parts can be very useful),
it seems most important to quickly validate the basic idea first.

One obvious thing that still seems worth pointing out: you may not
need to decompress anything. All that you really need to do is to get
the logic from some routine like PageIndexMultiDelete() to be executed
by a REDO routine. Perhaps it'll make sense to come up with a
representation that can just be passed to the routine directly. (I
really don't know how likely this is, but it's something to consider.)

> I'm tempted to start by just using single-byte offsets when possible since
> that should be relatively simple while still yielding a decent improvement
> for many workloads. What do you think?

The really big wins for WAL size will come from applying high level
insights about what is really needed, in all likelihood. The main
overhead is very often generic WAL record header overhead. When it is
then it'll be hard to really move the needle just by compressing the
payload. To do that you'd need to find a way to use fewer WAL records
to do the same amount of work.

The thing that I think will really make the biggest difference is
merging PRUNE records with FREEZE records (and maybe even make the
merged record do the work of a VISIBLE record when that's possible).
Just because now you have 1 WAL record instead of 2 (or even 3) WAL
records. Obviously that's a complicated project, but it's another case
where it feels like the more efficient approach might also be simpler.
We often write a PRUNE record with only one or two items in the array,
in which case it's practically free to do some freezing, at least in
terms of WAL space overhead (as long as you can do it with the same
WAL record). Plus freezing is already inescapably tied to pruning --
we always prune a page that we're going to try to freeze in VACUUM (we
can't safely freeze dead tuples, so there is more or less a dependency
already).

Not that you shouldn't pursue compressing the payload from WAL records
as a project -- maybe that'll work very well. I'm just pointing out
that there is a bigger picture, that may or may not end up mattering
here. For the patch on this thread there certainly is a bigger picture
about costs over time. Something like that could be true for this
other patch too.

It's definitely worth considering the size of the WAL records when
there are only one or two items, how common that may be in each
individual case, etc.
For example, FREEZE records have a minimum size of 64 bytes in
practice, due to WAL record alignment overhead (the payload itself
doesn't usually have to be aligned, but the WAL header still does). It
may be possible to avoid going over the critical threshold that makes
the WAL size one MAXALIGN() quantum larger in the event of having only
a few tuples to freeze, a scenario where negative compression is
likely.

Negative compression is always a potential problem, but maybe you can
deal with it very effectively by thinking about the problem
holistically. If you're "wasting" space that was just going to be
alignment padding anyway, does it really matter at all? (Maybe there
is some reason to care, but I offhand I can't think of one.)

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2022-09-22 16:54:49 Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions
Previous Message Tom Lane 2022-09-22 16:48:47 Re: Extending outfuncs support to utility statements