Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format

From: Dharin Shah <dharinshah95(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Robert Treat <rob(at)xzilla(dot)net>, Peter Eisentraut <peter(at)eisentraut(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format
Date: 2025-12-25 00:54:46
Message-ID: CAOj6k6dG0M-G0gs_Htray5y_pfvacXSjKYw0uJaaYJvOeqk7xw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgadmin-hackers pgsql-hackers

Thanks Michael & Robert,

Agreed — I don’t think it’s realistic or practical to talk about
deprecating or replacing pglz (or lz4) given on-disk compatibility
requirements.

> Note that I am not on board with simply reusing varatt_external for
> zstd-compressed entries, neither do I think that this is the best move
> ever. It makes the core patch simpler, but it makes things like
> ToastCompressionId more complicated to think about. If anything, I'd
> consider a rename of varatt_external as the best way to go with an
> intermediate "translation" structure only used in memory as I am
> proposing on the other thread (something that others seem meh enough
> about but I am not seeing alternate proposals floating around,
> either). This would make things like detoast_external_attr() less
> confusing, I think, as the latest patch posted on this thread is
> actually proving with its shortcut for toast_fetch_datum as one
> example of something I'd rather not do..

On the design: I understand & share the same concerns that we’d end up with
multiple “sources of truth” for external compression method identification
(pglz/lz4 via va_extinfo bits, zstd via vartag), and that this pushes
method-specific shortcuts into detoast paths.

Would you be OK if I split this into two steps?

1.First, a refactor-only patch introducing a small decoded/in-memory
representation of an external TOAST pointer, so detoast/toast code paths
don’t have to reason directly about tcinfo vs vartag vs va_extinfo. This
would be a cleanup with no on-disk format change and no behavioral change
for existing methods. Is this the same “translation structure” approach you
mentioned in the other thread? If you can point me to it, I’ll align with
that proposal.

2. Then, a follow-up patch adding zstd using VARTAG_ONDISK_ZSTD,
implemented on top of that abstraction to keep zstd handling centralized
and minimize special-casing in detoast.
If that direction matches what you had in mind, I can first post the
proposed translation structure/API for feedback before respinning the zstd
patch.

Thanks,
Dharin

On Thu, Dec 25, 2025 at 1:25 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Wed, Dec 24, 2025 at 11:50:48AM -0500, Robert Treat wrote:
> > Agreed that I can't see pglz being removed any time soon, if ever.
> > Thinking through what a conversion process would look like seems
> > unwieldy at best, so I think we definitely need it for backwards
> > compatibility, plus I think it is useful to have a self-contained
> > option. I'd almost suggest we should look at replacing lz4, but I
> > don't think that is significantly easier, it just has a smaller, more
> > invested, blast radius.
>
> Backward-compatibility requirements make a replacement of LZ4
> basically impossible to me, for the same reasons as pglz. We could
> not replace the bit used in the va_extinfo to track if LZ4 compression
> is used, either. One thing that I do wonder is if it would make
> things simpler in the long-run if we introduced a new separated vartag
> for LZ4-compressed external TOAST pointers as well. At least we'd
> have a leaner design: it means that we have to keep the
> varatt_external available on read, but we could update to the new
> format when writing entries. Or perhaps that's not worth the
> complication based on the last sentence you are writing..
>
> > That said, I do suspect ztsd could quickly
> > become a popular recommendation and/or default among users /
> > consultants / service providers.
>
> .. Because I strongly suspect that this is going to be true, and that
> zstd would just be a better replacement over lz4. That's a trend that
> I see is already going on for wal_compression.
>
> Note that I am not on board with simply reusing varatt_external for
> zstd-compressed entries, neither do I think that this is the best move
> ever. It makes the core patch simpler, but it makes things like
> ToastCompressionId more complicated to think about. If anything, I'd
> consider a rename of varatt_external as the best way to go with an
> intermediate "translation" structure only used in memory as I am
> proposing on the other thread (something that others seem meh enough
> about but I am not seeing alternate proposals floating around,
> either). This would make things like detoast_external_attr() less
> confusing, I think, as the latest patch posted on this thread is
> actually proving with its shortcut for toast_fetch_datum as one
> example of something I'd rather not do..
> --
> Michael
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-12-25 01:30:37 Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
Previous Message Michael Paquier 2025-12-25 00:45:33 Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring

Browse pgadmin-hackers by date

  From Date Subject
Previous Message Michael Paquier 2025-12-25 00:24:57 Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format