Pluggable toaster

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: pgsql-hackers <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Pluggable toaster
Date: 2021-12-30 16:40:09
Message-ID: 224711f9-83b7-a307-b17f-4457ab73aa0a@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

We are working on custom toaster for JSONB [1], because current TOAST is
universal for any data type and because of that it has some disadvantages:
  - "one toast fits all"  may be not the best solution for particular
    type or/and use cases
  - it doesn't know the internal structure of data type, so it  cannot
choose an optimal toast strategy
  - it can't  share common parts between different rows and even
    versions of rows

Modification of current toaster for all tasks and cases looks too
complex, moreover, it  will not works for  custom data types. Postgres
is an extensible database,  why not to extent its extensibility even
further, to have pluggable TOAST! We  propose an idea to separate
toaster from  heap using  toaster API similar to table AM API etc.
Following patches are applicable over patch in [1]

1) 1_toaster_interface_v1.patch.gz
https://github.com/postgrespro/postgres/tree/toaster_interface
 Introduces  syntax for storage and formal toaster API. Adds column
atttoaster to pg_attribute, by design this column should not be equal to
invalid oid for any toastable datatype, ie it must have correct oid for
any type (not column) with non-plain storage. Since  toaster may support
only particular datatype, core should check correctness of toaster set
by toaster validate method. New commands could be found in
src/test/regress/sql/toaster.sql

On-disk toast pointer structure now has one more possible struct -
varatt_custom with fixed header and variable tail which uses as a
storage for custom toasters. Format of built-in toaster is kept to allow
simple pg_upgrade logic.

Since toaster for column could be changed during table's lifetime we had
two options about toaster's drop operation:
  - if column's toaster has been changed,  then we need to re-toast all
    values, which could be extremely expensive. In any case,
    functions/operators should be ready to work with values toasted by
    different toasters, although any toaster should execute simple
    toast/detoast operation, which allows any existing code to
    work with the new approach. Tracking dependency of toasters and
rows looks as bad idea.
  - disallow drop toaster. We don't believe that there will be many
    toasters at the same time (number of AM isn't very high too and
    we don't believe that it will be changed significantly in the near
    future), so prohibition of  dropping  of toaster looks reasonable.
In this patch set we choose second option.

Toaster API includes get_vtable method, which is planned to access the
custom toaster features which isn't covered by this API.  The idea is,
that toaster returns some structure with some values and/or pointers to
toaster's methods and caller could use it for particular purposes, see
patch 4). Kind of structure identified by magic number, which should be
a first field in this structure.

Also added contrib/dummy_toaster to simplify checking.

psql/pg_dump are modified to support toaster object concept.

2) 2_toaster_default_v1.patch.gz
https://github.com/postgrespro/postgres/tree/toaster_default
Built-in toaster implemented (with some refactoring)  uisng toaster API
as generic (or default) toaster.  dummy_toaster here is a minimal
workable example, it saves value directly in toast pointer and fails if
value is greater than 1kb.

3) 3_toaster_snapshot_v1.patch.gz
https://github.com/postgrespro/postgres/tree/toaster_snapshot
The patch implements technology to distinguish row's versions in toasted
values to share common parts of toasted values between different
versions of rows

4) 4_bytea_appendable_toaster_v1.patch.gz
https://github.com/postgrespro/postgres/tree/bytea_appendable_toaster
Contrib module implements toaster for non-compressed bytea columns,
which allows fast appending to existing bytea value. Appended tail
stored directly in toaster pointer, if there is enough place to do it.

Note: patch modifies byteacat() to support contrib toaster. Seems, it's
looks ugly and contrib module should create new concatenation function.

We are open for any questions, discussions, objections and advices.
Thank you.

Peoples behind:
Oleg Bartunov
Nikita Gluhov
Nikita Malakhov
Teodor Sigaev

[1]
https://www.postgresql.org/message-id/flat/de83407a-ae3d-a8e1-a788-920eb334f25b(at)sigaev(dot)ru
<https://www.postgresql.org/message-id/flat/de83407a-ae3d-a8e1-a788-920eb334f25b(at)sigaev(dot)ru>

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
4_bytea_appendable_toaster_v1.patch.gz application/gzip 5.7 KB
3_toaster_snapshot_v1.patch.gz application/gzip 6.3 KB
2_toaster_default_v1.patch.gz application/gzip 26.4 KB
1_toaster_interface_v1.patch.gz application/gzip 41.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-12-30 16:50:53 Re: Strange path from pgarch_readyXlog()
Previous Message Tom Lane 2021-12-30 16:24:55 Re: Autovacuum and idle_session_timeout