Re: Pluggable toaster

From: Nikita Malakhov <hukutoc(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Pluggable toaster
Date: 2022-07-13 19:45:40
Message-ID: CAN-LCVNkU+kdieu4i_BDnLgGszNY1RCnL6Dsrdz44fY7FOG3vg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers!
According to previous requests the patch branch was cleaned up from
garbage, logs, etc. All conflicts' resolutions were merged
into patch commits where they appear, branch was rebased to present one
commit for one patch. The branch was actualized,
and a fresh patch set was generated.
https://github.com/postgrespro/postgres/tree/toasterapi_clean

What we propose in short:
We suggest a way to make TOAST pluggable as Storage (in a way like
Pluggable Access
Methods) - detached TOAST
mechanics from Heap AM, and made it an independent pluggable and extensible
part with our freshly developed TOAST API.
With this patch set you will be able to develop and plug in your own
TOAST mechanics
for table columns. Knowing internals
and/or workflow and workload of data being TOASTed makes Custom Toasters much
more efficient in performance and storage.
We keep backwards compatibility and default TOAST mechanics works as it
worked previously, working silently with any
Toastable datatype
(and TOASTed values and tables from previous versions, no changes in this)
and set as default Toaster is not stated otherwise,
but through our TOAST API.

We've already presented out work at HighLoad, PgCon and PgConf conferences,
you can find materials here
http://www.sai.msu.su/~megera/postgres/talks/
Testing scripts used in talks are a bit scarce and have a lot of
manual handling, so it is another bit of work to bunch them into
patch set, please be patient, I'll try to make it ASAP.

We have ready to plug in extension Toasters
- bytea appendable toaster for bytea datatype (impressive speedup with
bytea append operation) is included in this patch set;
- JSONB toaster for JSONB (very cool performance improvements when dealing
with TOASTed JSONB) will be provided later;
- Prototype Toasters (in development) for PostGIS (much faster then default
with geometric data), large binary objects
(like pg_largeobject, but much, much larger, and without existing large
object limitations), and currently we're checking default
Toaster implementation without using Indexes (direct access by TIDs, up to
3 times faster than default on smaller values,
less storage due to absence of index tree).

Patch set consists of 8 incremental patches:
0001_create_table_storage_v5.patch - SQL syntax fix for CREATE TABLE
clause, processing SET STORAGE... correctly;
This patch is already discussed in a separate thread;

0002_toaster_interface_v8.patch - TOAST API interface and SQL syntax
allowing creation of custom Toaster (CREATE TOASTER ...)
and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE
EXTERNAL TOASTER bytea_toaster);)

0003_toaster_default_v7.patch - Default TOAST implemented via TOAST API;

0004_toaster_snapshot_v7.patch - refactoring of Default TOAST and support
for versioned Toast rows;

0005_bytea_appendable_toaster_v7.patch - contrib module
bytea_appendable_toaster - special Toaster for bytea datatype with
customized append operation;

0006_toasterapi_docs_v3.patch - documentation package for Pluggable TOAST;

0007_fix_alignment_of_custom_toast_pointers_v3.patch - fixes custom toast
pointer's
alignment required by bytea toaster by Nikita Glukhov;

0008_fix_toast_tuple_externalize_v3.patch - fixes toast_tuple_externalize
function
not to call toast if old data is the same as new one.

The example of usage the TOAST API:
CREATE EXTENSION bytea_toaster;CREATE TABLE test_bytea_append (id int, a
bytea STORAGE EXTERNAL);
ALTER TABLE test_bytea_append ALTER a SET TOASTER bytea_toaster;
INSERT INTO test_bytea_append SELECT i, repeat('a', 10000)::bytea FROM
generate_series(1, 10) i;
UPDATE test_bytea_append SET a = a || repeat('b', 3000)::bytea;

This patch set opens the following issues:
1) With TOAST independent of AM it is used by it makes sense to move
compression from AM into Toaster and make Compression one of Toaster's
options.
Actually, Toasters allow to use any compression methods independently of AM;
2) Implement default Toaster without using Indexes (currently in
development)?
3) Allows different, SQL-accessed large objects of almost infinite size IN
DATABASE, unlike current large_object functionality and does not limit
their quantity;
4) Several already developed Toasters show impressive results for
datatypes they were designed for.

We're awaiting feedback.

Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

On Mon, Jul 11, 2022 at 3:03 PM Nikita Malakhov <hukutoc(at)gmail(dot)com> wrote:

> Hi!
> We have branch with incremental commits worm where patches were generated
> with format-patch -
> https://github.com/postgrespro/postgres/tree/toasterapi_clean
> I'll clean up commits from garbage files asap, sorry, haven't noticed them
> while moving changes.
>
> Best regards,
> Nikita Malakhov
>
> On Fri, Jul 1, 2022 at 3:27 PM Matthias van de Meent <
> boekewurm+postgres(at)gmail(dot)com> wrote:
>
>> On Thu, 30 Jun 2022 at 22:26, Nikita Malakhov <hukutoc(at)gmail(dot)com> wrote:
>> >
>> > Hi hackers!
>> > Here is the patch set rebased onto current master (15 rel beta 2 with
>> commit from 29.06).
>>
>> Thanks!
>>
>> > Just to remind:
>> > With this patch set you will be able to develop and plug in your own
>> TOAST mechanics for table columns. Knowing internals and/or workflow and
>> workload
>> > of data being TOASTed makes Custom Toasters much more efficient in
>> performance and storage.
>>
>> The new toast API doesn't seem to be very well documented, nor are the
>> new features. Could you include a README or extend the comments on how
>> this is expected to work, and/or how you expect people to use (the
>> result of) `get_vtable`?
>>
>> > Patch set consists of 9 incremental patches:
>> > [...]
>> > 0002_toaster_interface_v7.patch - TOAST API interface and SQL syntax
>> allowing creation of custom Toaster (CREATE TOASTER ...)
>> > and setting Toaster to a table column (CREATE TABLE t (data bytea
>> STORAGE EXTERNAL TOASTER bytea_toaster);)
>>
>> This patch 0002 seems to include changes to log files (!) that don't
>> exist in current HEAD, but at the same time are not created by patch
>> 0001. Could you please check and sanitize your patches to ensure that
>> the changes are actually accurate?
>>
>> Like Robert Haas mentioned earlier[0], please create a branch in a git
>> repository that has a commit containing the changes for each patch,
>> and then use git format-patch to generate a single patchset, one that
>> shares a single version number. Keeping track of what patches are
>> needed to test this CF entry is already quite difficult due to the
>> amount of patches and their packaging (I'm having troubles managing
>> these seperate .patch.gz), and the different version tags definitely
>> don't help in finding the correct set of patches to apply once
>> downloaded.
>>
>> Kind regards,
>>
>> Matthias van de Meent
>>
>> [0]
>> https://www.postgresql.org/message-id/CA%2BTgmoZBgNipyKuQAJzNw2w7C9z%2B2SMC0SAHqCnc_dG1nSLNcw%40mail.gmail.com
>>
>
>
>

Attachment Content-Type Size
0001_create_table_storage_v5.patch.gz application/x-gzip 4.2 KB
0004_toaster_snapshot_v7.patch.gz application/x-gzip 7.1 KB
0003_toaster_default_v7.patch.gz application/x-gzip 28.4 KB
0002_toaster_interface_v8.patch.gz application/x-gzip 44.8 KB
0005_bytea_appendable_toaster_v7.patch.gz application/x-gzip 6.3 KB
0006_toasterapi_docs_v3.patch.gz application/x-gzip 3.9 KB
0008_fix_toast_tuple_externalize_v3.patch.gz application/x-gzip 584 bytes
0007_fix_alignment_of_custom_toast_pointers_v3.patch.gz application/x-gzip 801 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-07-13 19:46:17 Re: Bug: Reading from single byte character column type may cause out of bounds memory reads.
Previous Message Tom Lane 2022-07-13 19:35:15 Re: make update-po@master stops at pg_upgrade