Re: [PATCH] Clarify the comments about varlena header encoding

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Clarify the comments about varlena header encoding
Date: 2022-08-18 04:14:34
Message-ID: CAFBsxsGkYq_atJ7i157cWG-YE2EPV2Bwq60Vw1LHJ7vjx+gFgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 18, 2022 at 1:06 AM Aleksander Alekseev
<aleksander(at)timescale(dot)com> wrote:
>
> Hi hackers,
>
> I noticed that the comments regarding bit layouts for varlena headers
> in postgres.h are somewhat misleading. For instance, when reading:

I agree it's confusing, but I don't think this patch is the right direction.

> ```
> 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
> ```
>
> ... one can assume this is a 00xxxxxx byte followed by another 4 bytes
> (which is wrong). Also one can read this as "aligned, uncompressed
> data" (which again is wrong).

- * 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
+ * 00xxxxxx xxxxxxxx xxxxxxxx xxxxxxxx, uncompressed data (up to 1G)

Maybe "00xxxxxx 4-byte length word (aligned)," is more clear about
what is aligned. Also, adding all those xxx's obscures the point that
we only need to examine one byte to figure out what to do next.

> ```
> 10000000 1-byte length word, unaligned, TOAST pointer
> ```
>
> This is misleading too. The comments above this line say that `struct
> varatt_external` is a TOAST pointer. sizeof(varatt_external) = 16,
> plus 1 byte equals 17, right? However the documentation [1] claims the
> result should be 18:

The patch has:

+ * In the third case the va_tag field (see varattrib_1b_e) is used to discern
+ * the specific type and length of the pointer datum. On disk the "xxx" bits
+ * currently always store sizeof(varatt_external) + 2.

...so not sure where 17 came from.

- * 10000000 1-byte length word, unaligned, TOAST pointer
+ * 10000000 xxxxxxxx, TOAST pointer (struct varatt_external)

This implies that the header is two bytes, which is not accurate. That
next byte is a type tag:

/* TOAST pointers are a subset of varattrib_1b with an identifying tag byte */
typedef struct
{
uint8 va_header; /* Always 0x80 or 0x01 */
uint8 va_tag; /* Type of datum */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Type-specific data */
} varattrib_1b_e;

...and does not always represent the on-disk length:

/*
* Type tag for the various sorts of "TOAST pointer" datums. The peculiar
* value for VARTAG_ONDISK comes from a requirement for on-disk compatibility
* with a previous notion that the tag field was the pointer datum's length.
*/
typedef enum vartag_external
{
VARTAG_INDIRECT = 1,
VARTAG_EXPANDED_RO = 2,
VARTAG_EXPANDED_RW = 3,
VARTAG_ONDISK = 18
} vartag_external;

And I don't think the new comments referring to "third case", "first
two cases", etc make it easier to follow.
--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2022-08-18 04:59:29 Re: static libpq (and other libraries) overwritten on aix
Previous Message Nikita Glukhov 2022-08-18 03:45:56 Re: SQL/JSON features for v15