Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
Date: 2022-09-21 21:11:36
Message-ID: CAH2-WzmoMBOvKsqb5qt=LMVDUctJZn3ZDp1CQpuJiu1eXDx5zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 21, 2022 at 1:14 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> This idea seems promising. I see that you called this patch a
> work-in-progress, so I'm curious what else you are planning to do with it.

I really just meant that the patch wasn't completely finished at that
point. I hadn't yet convinced myself that I mostly had it right. I'm
more confident now.

> As I'm reading this thread and the patch, I'm finding myself wondering if
> it's worth exploring using wal_compression for these records instead.

The term deduplication works better than compression here because
we're not actually decompressing anything in the REDO routine. Rather,
the REDO routine processes each freeze plan by processing all affected
tuples in order. To me this seems like the natural way to structure
things -- the WAL records are much smaller, but in a way that's kind
of incidental. The approach taken by the patch just seems like the
natural approach, given the specifics of how freezing works at a high
level.

> I think you've essentially created an efficient compression mechanism for
> this one type of record, but I'm assuming that lz4/zstd would also yield
> some rather substantial improvements for this kind of data.

I don't think of it that way. I've used the term "deduplication" to
advertise the patch, but that's mostly just a description of what
we're doing in the patch relative to what we do on HEAD today. There
is nothing truly clever in the patch. We see a huge amount of
redundancy among tuples from the same page in practically all cases,
for reasons that have everything to do with what freezing is, and how
it works at a high level. The thought process that led to my writing
this patch was more high level than appearances suggest. (I often
write patches that combine high level and low level insights in some
way or other, actually.)

Theoretically there might not be very much redundancy within each
xl_heap_freeze_page record, with the right workload, but in practice a
decrease of 4x or more is all but guaranteed once you have more than a
few tuples to freeze on each page. If there are other WAL records that
are as space inefficient as xl_heap_freeze_page is, then I'd be
surprised -- it is *unusually* space inefficient (like I said, I
suspect that this may have something to do with the fact that it was
originally designed under time pressure). So I don't expect that this
patch tells us much about what we should do for any other WAL record.
I certainly *hope* that it doesn't, at least.

> Presumably a
> generic WAL record compression mechanism could be reused for other large
> records, too. That could be much easier than devising a deduplication
> strategy for every record type.

It's quite possible that that's a good idea, but that should probably
work as an additive thing. That's something that I think of as a
"clever technique", whereas I'm focussed on just not being naive in
how we represent this one specific WAL record type.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-09-21 21:37:05 Re: Transparent column encryption
Previous Message Tom Lane 2022-09-21 20:53:52 Re: pg_auth_members.grantor is bunk