Re: Pluggable toaster

From: Nikita Malakhov <hukutoc(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Pluggable toaster
Date: 2022-07-20 09:15:41
Message-ID: CAN-LCVNMot2xqq2pHi4OCbgA1nK4KkRfWiJQGK9EgiTbW3O8gQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers!

We really need your feedback on the last patchset update!

On a previous question about TOAST API overhead - please check script in
attach, we tested INSERT, UPDATE and SELECT
operations, and ran it on vanilla master and on patched master (vanilla
with untouched TOAST implementation and patched
with default TOAST implemented via TOAST API, in this patch set - with
patches up to 0005_bytea_appendable_toaster installed).
Some of the test scripts will be included in the patch set later, as an
additional patch.

Currently I'm working on an update to the default Toaster (some internal
optimizations, not affecting functionality)
and readme files explaining Pluggable TOAST.

An example of using custom Toaster:

Custom Toaster extension definition (developer):
CREATE FUNCTION custom_toaster_handler(internal)
RETURNS toaster_handler
AS 'MODULE_PATHNAME'
LANGUAGE C;

CREATE TOASTER custom_toaster HANDLER custom_toaster_handler;

User's POV:
CREATE EXTENSION custom_toaster;
select * from pg_toaster;
oid | tsrname | tsrhandler
-------+----------------+-------------------------
9864 | deftoaster | default_toaster_handler
32772 | custom_toaster | custom_toaster_handler

CREATE TABLE tst1 (
c1 text STORAGE plain,
c2 text STORAGE external TOASTER custom_toaster,
id int4
);
ALTER TABLE tst1 ALTER COLUMN c1 SET TOASTER custom_toaster;
=# \d+ tst1
Column | Type | Collation | Nullable | Default | Storage | Toaster
|...
--------+---------+-----------+----------+---------+----------+----------------+...
c1 | text | | | | plain | deftoaster
|...
c2 | text | | | | external |
custom_toaster |...
id | integer | | | | plain |
|...
Access method: heap

Thanks!

Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

On Wed, Jul 13, 2022 at 10:45 PM Nikita Malakhov <hukutoc(at)gmail(dot)com> wrote:

> Hi hackers!
> According to previous requests the patch branch was cleaned up from
> garbage, logs, etc. All conflicts' resolutions were merged
> into patch commits where they appear, branch was rebased to present one
> commit for one patch. The branch was actualized,
> and a fresh patch set was generated.
> https://github.com/postgrespro/postgres/tree/toasterapi_clean
>
> What we propose in short:
> We suggest a way to make TOAST pluggable as Storage (in a way like
> Pluggable Access Methods) - detached TOAST
> mechanics from Heap AM, and made it an independent pluggable and
> extensible part with our freshly developed TOAST API.
> With this patch set you will be able to develop and plug in your own TOAST mechanics
> for table columns. Knowing internals
> and/or workflow and workload of data being TOASTed makes Custom Toasters much
> more efficient in performance and storage.
> We keep backwards compatibility and default TOAST mechanics works as it
> worked previously, working silently with any
> Toastable datatype
> (and TOASTed values and tables from previous versions, no changes in this)
> and set as default Toaster is not stated otherwise,
> but through our TOAST API.
>
> We've already presented out work at HighLoad, PgCon and PgConf
> conferences, you can find materials here
> http://www.sai.msu.su/~megera/postgres/talks/
> Testing scripts used in talks are a bit scarce and have a lot of
> manual handling, so it is another bit of work to bunch them into
> patch set, please be patient, I'll try to make it ASAP.
>
> We have ready to plug in extension Toasters
> - bytea appendable toaster for bytea datatype (impressive speedup with
> bytea append operation) is included in this patch set;
> - JSONB toaster for JSONB (very cool performance improvements when
> dealing with TOASTed JSONB) will be provided later;
> - Prototype Toasters (in development) for PostGIS (much faster then
> default with geometric data), large binary objects
> (like pg_largeobject, but much, much larger, and without existing large
> object limitations), and currently we're checking default
> Toaster implementation without using Indexes (direct access by TIDs, up
> to 3 times faster than default on smaller values,
> less storage due to absence of index tree).
>
> Patch set consists of 8 incremental patches:
> 0001_create_table_storage_v5.patch - SQL syntax fix for CREATE TABLE
> clause, processing SET STORAGE... correctly;
> This patch is already discussed in a separate thread;
>
> 0002_toaster_interface_v8.patch - TOAST API interface and SQL syntax
> allowing creation of custom Toaster (CREATE TOASTER ...)
> and setting Toaster to a table column (CREATE TABLE t (data bytea STORAGE
> EXTERNAL TOASTER bytea_toaster);)
>
> 0003_toaster_default_v7.patch - Default TOAST implemented via TOAST API;
>
> 0004_toaster_snapshot_v7.patch - refactoring of Default TOAST and support
> for versioned Toast rows;
>
> 0005_bytea_appendable_toaster_v7.patch - contrib module
> bytea_appendable_toaster - special Toaster for bytea datatype with
> customized append operation;
>
> 0006_toasterapi_docs_v3.patch - documentation package for Pluggable TOAST;
>
> 0007_fix_alignment_of_custom_toast_pointers_v3.patch - fixes custom toast
> pointer's
> alignment required by bytea toaster by Nikita Glukhov;
>
> 0008_fix_toast_tuple_externalize_v3.patch - fixes toast_tuple_externalize
> function
> not to call toast if old data is the same as new one.
>
> The example of usage the TOAST API:
> CREATE EXTENSION bytea_toaster;CREATE TABLE test_bytea_append (id int, a
> bytea STORAGE EXTERNAL);
> ALTER TABLE test_bytea_append ALTER a SET TOASTER bytea_toaster;
> INSERT INTO test_bytea_append SELECT i, repeat('a', 10000)::bytea FROM
> generate_series(1, 10) i;
> UPDATE test_bytea_append SET a = a || repeat('b', 3000)::bytea;
>
> This patch set opens the following issues:
> 1) With TOAST independent of AM it is used by it makes sense to move
> compression from AM into Toaster and make Compression one of Toaster's
> options.
> Actually, Toasters allow to use any compression methods independently of
> AM;
> 2) Implement default Toaster without using Indexes (currently in
> development)?
> 3) Allows different, SQL-accessed large objects of almost infinite size IN
> DATABASE, unlike current large_object functionality and does not limit
> their quantity;
> 4) Several already developed Toasters show impressive results for
> datatypes they were designed for.
>
> We're awaiting feedback.
>
> Regards,
> Nikita Malakhov
> Postgres Professional
> https://postgrespro.ru/
>
> On Mon, Jul 11, 2022 at 3:03 PM Nikita Malakhov <hukutoc(at)gmail(dot)com> wrote:
>
>> Hi!
>> We have branch with incremental commits worm where patches were generated
>> with format-patch -
>> https://github.com/postgrespro/postgres/tree/toasterapi_clean
>> I'll clean up commits from garbage files asap, sorry, haven't noticed
>> them while moving changes.
>>
>> Best regards,
>> Nikita Malakhov
>>
>> On Fri, Jul 1, 2022 at 3:27 PM Matthias van de Meent <
>> boekewurm+postgres(at)gmail(dot)com> wrote:
>>
>>> On Thu, 30 Jun 2022 at 22:26, Nikita Malakhov <hukutoc(at)gmail(dot)com> wrote:
>>> >
>>> > Hi hackers!
>>> > Here is the patch set rebased onto current master (15 rel beta 2 with
>>> commit from 29.06).
>>>
>>> Thanks!
>>>
>>> > Just to remind:
>>> > With this patch set you will be able to develop and plug in your own
>>> TOAST mechanics for table columns. Knowing internals and/or workflow and
>>> workload
>>> > of data being TOASTed makes Custom Toasters much more efficient in
>>> performance and storage.
>>>
>>> The new toast API doesn't seem to be very well documented, nor are the
>>> new features. Could you include a README or extend the comments on how
>>> this is expected to work, and/or how you expect people to use (the
>>> result of) `get_vtable`?
>>>
>>> > Patch set consists of 9 incremental patches:
>>> > [...]
>>> > 0002_toaster_interface_v7.patch - TOAST API interface and SQL syntax
>>> allowing creation of custom Toaster (CREATE TOASTER ...)
>>> > and setting Toaster to a table column (CREATE TABLE t (data bytea
>>> STORAGE EXTERNAL TOASTER bytea_toaster);)
>>>
>>> This patch 0002 seems to include changes to log files (!) that don't
>>> exist in current HEAD, but at the same time are not created by patch
>>> 0001. Could you please check and sanitize your patches to ensure that
>>> the changes are actually accurate?
>>>
>>> Like Robert Haas mentioned earlier[0], please create a branch in a git
>>> repository that has a commit containing the changes for each patch,
>>> and then use git format-patch to generate a single patchset, one that
>>> shares a single version number. Keeping track of what patches are
>>> needed to test this CF entry is already quite difficult due to the
>>> amount of patches and their packaging (I'm having troubles managing
>>> these seperate .patch.gz), and the different version tags definitely
>>> don't help in finding the correct set of patches to apply once
>>> downloaded.
>>>
>>> Kind regards,
>>>
>>> Matthias van de Meent
>>>
>>> [0]
>>> https://www.postgresql.org/message-id/CA%2BTgmoZBgNipyKuQAJzNw2w7C9z%2B2SMC0SAHqCnc_dG1nSLNcw%40mail.gmail.com
>>>
>>
>>
>>
>
>

Attachment Content-Type Size
api_perf.sql application/octet-stream 2.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2022-07-20 09:18:59 Re: Is select_outer_pathkeys_for_merge() too strict now we have Incremental Sorts?
Previous Message vignesh C 2022-07-20 09:03:16 Re: Handle infinite recursion in logical replication setup