Re: Pluggable toaster

From: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pluggable toaster
Date: 2022-01-05 14:45:56
Message-ID: CANbhV-F4Vffu7hWDEK+yBjquq4EQGH3JAJ+K4tjHEopFU0a-kQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 30 Dec 2021 at 16:40, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:

> We are working on custom toaster for JSONB [1], because current TOAST is
> universal for any data type and because of that it has some disadvantages:
> - "one toast fits all" may be not the best solution for particular
> type or/and use cases
> - it doesn't know the internal structure of data type, so it cannot
> choose an optimal toast strategy
> - it can't share common parts between different rows and even
> versions of rows

Agreed, Oleg has made some very clear analysis of the value of having
a higher degree of control over toasting from within the datatype.

In my understanding, we want to be able to
1. Access data from a toasted object one slice at a time, by using
knowledge of the structure
2. If toasted data is updated, then update a minimum number of
slices(s), without rewriting the existing slices
3. If toasted data is expanded, then allownew slices to be appended to
the object without rewriting the existing slices

> Modification of current toaster for all tasks and cases looks too
> complex, moreover, it will not works for custom data types. Postgres
> is an extensible database, why not to extent its extensibility even
> further, to have pluggable TOAST! We propose an idea to separate
> toaster from heap using toaster API similar to table AM API etc.
> Following patches are applicable over patch in [1]

ISTM that we would want the toast algorithm to be associated with the
datatype, not the column?
Can you explain your thinking?

We already have Expanded toast format, in-memory, which was designed
specifically to allow us to access sub-structure of the datatype
in-memory. So I was expecting to see an Expanded, on-disk, toast
format that roughly matched that concept, since Tom has already shown
us the way. (varatt_expanded). This would be usable by both JSON and
PostGIS.

Some other thoughts:

I imagine the data type might want to keep some kind of dictionary
inside the main toast pointer, so we could make allowance for some
optional datatype-specific private area in the toast pointer itself,
allowing a mix of inline and out-of-line data, and/or a table of
contents to the slices.

I'm thinking could also tackle these things at the same time:
* We want to expand TOAST to 64-bit pointers, so we can have more
pointers in a table
* We want to avoid putting the data length into the toast pointer, so
we can allow the toasted data to be expanded without rewriting
everything (to avoid O(N^2) cost)

--
Simon Riggs http://www.EnterpriseDB.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-01-05 15:05:21 Re: refactoring basebackup.c
Previous Message Alvaro Herrera 2022-01-05 14:10:17 Re: row filtering for logical replication