Re: RFI: Extending the TOAST Pointer

From: Aleksander Alekseev <aleksander(at)timescale(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Nikita Malakhov <hukutoc(at)gmail(dot)com>
Subject: Re: RFI: Extending the TOAST Pointer
Date: 2023-05-18 10:51:58
Message-ID: CAJ7c6TNAYyeMYKVkiwOZChy7UpE_CkjpYOk73gcWTXMkLkEyzw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Nikita,

> this part of the PostgreSQL screams to be revised and improved

I completely agree. The problem with TOAST pointers is that they are
not extendable at the moment which prevents adding new compression
algorithms (e.g. ZSTD), new features like compression dictionaries
[1], etc. I suggest we add extensibility in order to solve this
problem for the foreseeable future for everyone.

> where Custom TOAST Pointer is distinguished from Regular one by va_flag field
> which is a part of varlena header

I don't think that varlena header is the best place to distinguish a
classical TOAST pointer from an extended one. On top of that I don't
see any free bits that would allow adding such a flag to the on-disk
varlena representation [2].

The current on-disk TOAST pointer representation is following:

```
typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External saved size (without header) and
* compression method */
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
} varatt_external;
```

Note that currently only 2 compression methods are supported:

```
typedef enum ToastCompressionId
{
TOAST_PGLZ_COMPRESSION_ID = 0,
TOAST_LZ4_COMPRESSION_ID = 1,
TOAST_INVALID_COMPRESSION_ID = 2
} ToastCompressionId;
```

I suggest adding a new flag that will mark an extended TOAST format:

```
typedef enum ToastCompressionId
{
TOAST_PGLZ_COMPRESSION_ID = 0,
TOAST_LZ4_COMPRESSION_ID = 1,
TOAST_RESERVED_COMPRESSION_ID = 2,
TOAST_HAS_EXTENDED_FORMAT = 3,
} ToastCompressionId;
```

For an extended format we add a varint (utf8-like) bitmask right after
varatt_external that marks the features supported in this particular
instance of the pointer. The rest of the data is interpreted depending
on the bits set. This will allow us to extend the pointers
indefinitely.

Note that the proposed approach doesn't require running any
migrations. Note also that I described only the on-disk
representation. We can tweak the in-memory representation as we want
without affecting the end user.

Thoughts?

[1]: https://commitfest.postgresql.org/43/3626/
[2]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/postgres.h;h=0446daa0e61722067bb75aa693a92b38736e12df;hb=164d174bbf9a3aba719c845497863cd3c49a3ad0#l178

--
Best regards,
Aleksander Alekseev

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2023-05-18 11:17:02 Re: [PATCH] Allow Postgres to pick an unused port to listen
Previous Message Wei Wang (Fujitsu) 2023-05-18 08:53:03 RE: WL_SOCKET_ACCEPT fairness on Windows