From: | Nikhil Kumar Veldanda <veldanda(dot)nikhilkumar17(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: ZStandard (with dictionaries) compression support for TOAST compression |
Date: | 2025-06-05 07:03:49 |
Message-ID: | CAFAfj_FE-wHQtLBkOVab4MR=1GKErsLL7=HZbWBVnAxa=xncSA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks Michael, for providing feedback.
On Fri, May 30, 2025 at 12:21 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
>
> > During compression, compression methods (zstd_compress_datum) will
> > determine whether to use metadata(dictionary) or not based on
> > CompressionInfo.meta.
>
> Not sure about this one.
I removed the meta field and the CompressionInfo struct. I originally
used CompressionInfo to carry the compression method, zstd_level, and
zstd dict ID downstream, but since we’re now using a default
compression level for zstd, it’s no longer needed.
>
> > ALTER TABLE tblname
> > ALTER COLUMN colname
> > SET (zstd_level = 5);
> > ```
> >
>
> Specifying that as an attribute option makes sense here, but I don't
> think that this has to be linked to the initial patch set that should
> extend the toast data for the new compression method. It's a bit hard
> to say how relevant that is, and IMV it's kind of hard for users to
> know which level makes more sense. Setting up the wrong level can be
> equally very costly in CPU. For now, my suggestion would be to focus
> on the basics, and discard this part until we figure out the rest.
>
Ack. I’ve removed that option and will stick with ZSTD_CLEVEL_DEFAULT
as the compression level.
> +CompressionInfo
> +setup_cmp_info(char cmethod, Form_pg_attribute att)
Removed setup_cmp_info and its references.
> This routine declares a Form_pg_attribute as argument, does not use
> it. Due to that, it looks that attoptcache.h is pulled into
> toast_compression.c.
>
Removed it.
> Patch 0001 has the concept of metadata with various facilities, like
> VARATT_4BCE_HAS_META(), CompressionInfo, etc. However at the current
> stage we don't need that at all. Wouldn't it be better to delay this
> kind of abstraction layer to happen after we discuss how (and if) the
> dictionary part should be introduced rather than pay the cost of the
> facility in the first step of the implementation? This is not
> required as a first step. The toast zstd routines introduced in patch
> 0002 use !meta, discard meta=true as an error case.
>
Removed all metadata-related abstractions from patch 0001.
> +/* Helper: pack <flag, cmid> into a single byte: flag (b0), cmid-2
> (b1..7) */
>
> Having a one-liner here is far from enough? This is the kind of thing
> where we should spend time describing how things are done and why they
> are done this way. This is not sufficient, there's just too much to
> guess. The fact that we have VARATT_4BCE_EXTFLAG is only, but there's
> no real information about va_ecinfo and that it relates to the three
> bits sets, for example.
>
I’ve added a detailed comment explaining the one-byte layout.
> +#define VARTAG_SIZE(PTR) \
> [...]
> UNALIGNED_U32()
>
> This stuff feels magic. It's hard for someone to understand what's
> going on here, and there is no explanation about why it's done this
> way.
>
To clarify, we need to read a 32-bit value from an unaligned address
(specifically va_extinfo inside varatt_external) to determine the
toast_pointer size (by checking the top two bits to see if they equal
0b11, indicating an optional trailer). I wrote two versions of
READ_U32_UNALIGNED(ptr) that load four bytes individually and
reassemble them according to little- or big-endian order:
/**
* Safely read a 32-bit unsigned integer from *any* address, even when
* that address is **not** naturally aligned to 4 bytes. We do the load
* one byte at a time and re-assemble the word in *host* byte order.
* For LITTLE ENDIAN systems
*/
#define READ_U32_UNALIGNED(ptr) \
( (uint32) (((const uint8 *)(ptr))[0]) \
| ((uint32)(((const uint8 *)(ptr))[1]) << 8) \
| ((uint32)(((const uint8 *)(ptr))[2]) << 16) \
| ((uint32)(((const uint8 *)(ptr))[3]) << 24) )
/**
* For BIG ENDIAN systems.
*/
#define READ_U32_UNALIGNED(ptr) \
( (uint32) (((const uint8 *)(ptr))[3]) \
| ((uint32)(((const uint8 *)(ptr))[2]) << 8) \
| ((uint32)(((const uint8 *)(ptr))[1]) << 16) \
| ((uint32)(((const uint8 *)(ptr))[0]) << 24) )
Alternatively, one could use:
#define READ_U32_UNALIGNED(src) \
({ \
uint32 _tmp; \
memcpy(&_tmp, (src), sizeof(uint32)); \
_tmp; \
})
I chose the byte-by-byte version to avoid extra instructions in a hot path.
> -toast_compress_datum(Datum value, char cmethod)
> +toast_compress_datum(Datum value, CompressionInfo cmp)
> [...]
> - /* If the compression method is not valid, use the current default */
> - if (!CompressionMethodIsValid(cmethod))
> - cmethod = default_toast_compression;
>
> Removing the fallback to the default toast compression GUC if nothing
> is valid does not look right. There could be extensions that depend
> on that, and it's unclear what the benefits of setup_cmp_info() are,
> because it is not documented, so it's hard for one to understand how
> to use these changes.
I removed setup_cmp_info, all related code has been deleted.
> - result = (struct varlena *) palloc(TOAST_POINTER_SIZE);
> + result = (struct varlena *) palloc(TOAST_CMPID_EXTENDED(cm) ? TOAST_POINTER_EXT_SIZE : TOAST_POINTER_NOEXT_SIZE);
> [...]
> - memcpy(VARDATA_EXTERNAL(result), &toast_pointer,
> sizeof(toast_pointer));
> + memcpy(VARDATA_EXTERNAL(result), &toast_pointer,
> TOAST_CMPID_EXTENDED(cm) ? TOAST_POINTER_EXT_SIZE - VARHDRSZ_EXTERNAL
> : TOAST_POINTER_NOEXT_SIZE - VARHDRSZ_EXTERNAL) ;
>
> That looks, err... Hard to maintain to me. Okay, that's a
> calculation for the extended compression part, but perhaps this is a
> sign that we need to think harder about the surroundings of the
> toast_pointer to ease such calculations.
>
I simplified it by introducing a helper macro:
Now both the palloc call and the memcpy length calculation simply use
TOAST_POINTER_SIZE(cm) and TOAST_POINTER_SIZE(cm) − VARHDRSZ_EXTERNAL,
respectively.
#define TOAST_POINTER_SIZE(cm) \
(TOAST_CMPID_EXTENDED(cm) ? TOAST_POINTER_EXT_SIZE : TOAST_POINTER_NOEXT_SIZE)
> + {
> + {
> + "zstd_level",
> + "Set column's ZSTD compression level",
> + RELOPT_KIND_ATTRIBUTE,
> + ShareUpdateExclusiveLock
> + },
> + DEFAULT_ZSTD_LEVEL, MIN_ZSTD_LEVEL, MAX_ZSTD_LEVEL
> + },
>
> This could be worth a patch on its own, once we get the basics sorted
> out. I'm not even sure that we absolutely need that, TBH. The last
> time I've had a discussion on the matter for WAL compression we
> discarded the argument about the level because it's hard to understand
> how to tune, and the default is enough to work magics. For WAL, we've
> been using ZSTD_CLEVEL_DEFAULT in xloginsert.c, and I've not actually
> heard much about people wanting to tune the compression level. That
> was a few years ago, perhaps there are some more different opinions on
> the matter.
>
Removed it.
> Your patch introduces a new compression_zstd, touching very lightly
> compression.sql. I think that we should and can do much better than
> that in the long term. The coverage of compression.sql is quite good,
> and what the zstd code is adding does not cover all of it. Let's
> rework the tests of HEAD and split compression.sql for the LZ4 and
> pglz parts. If one takes a diff between compression.out and
> compression_1.out, he/she would notice that the only differences are
> caused by the existence of the lz4 table. This is not the smartest
> move we can do if we add more compression methods, so I'd suggest the
> following:
> - Add a new SQL function called pg_toast_compression_available(text)
> or similar, able to return if a toast compression method is supported
> or not. This would need two arguments once the initial support for
> zstd is done: lz4 and zstd. For head, we only require one: lz4.
> - Now, the actual reason why a function returning a boolean result is
> useful is for the SQL tests. It is possible with \if to make the
> tests conditional if LZ4 is supported or now, limiting the noise if
> LZ4 is not supported. See for example the tricks we use for the UTF-8
> encoding or NUMA.
> - Move the tests related to lz4 into a separate file, outside
> compression.sql, in a new file called compression_lz4.sql. With the
> addition of zstd toast support, we would add a new file:
> compression_zstd.sql. The new zstd suite would then just need to
> copy-paste the original one, with few tweaks. It may be better to
> parameterize that but we don't do that anymore these days with
> input/output regression files.
Agreed. I introduced pg_compression_available(text) and refactored the
SQL tests accordingly. I split out LZ4 tests into compression_lz4.sql
and created compression_zstd.sql with the appropriate differences.
v25-0001-Add-pg_compression_available-and-split-sql-compr.patch -
Introduced pg_compression_available function and split sql tests
related to compression
v25-0002-Design-to-extend-the-varattrib_4b-varatt_externa.patch -
Design proposal for varattrib_4b & varatt_external
v25-0003-Implement-Zstd-compression-no-dictionary-support.patch - zstd
no dictionary compression implementation
--
Nikhil Veldanda
Attachment | Content-Type | Size |
---|---|---|
v25-0003-Implement-Zstd-compression-no-dictionary-support.patch | application/octet-stream | 44.3 KB |
v25-0001-Add-pg_compression_available-and-split-sql-compr.patch | application/octet-stream | 45.5 KB |
v25-0002-Design-to-extend-the-varattrib_4b-varatt_externa.patch | application/octet-stream | 20.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Eduard Stefes | 2025-06-05 07:15:10 | [V2] Adding new CRC32C implementation for IBM S390X |
Previous Message | shveta malik | 2025-06-05 06:57:37 | Re: synchronized_standby_slots used in logical replication |