Re: Pluggable toaster

From: Nikita Malakhov <hukutoc(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Pluggable toaster
Date: 2022-07-25 21:20:23
Message-ID: CAN-LCVOUPnNFV_AAtFsE9oKiWzVji8AF814DBh+C_9e_5ST+AQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers!

Here is the patch set rebased to master from 22.07. I've got some trouble
rebasing it due to
conflicts, so it took some time.
I've made some corrections according to Matthias and Aleksander comments,
though not all
of them, because some require refactoring of very old code and would have
to take much more
effort. I keep these recommended corrections in mind and already working on
them but it will
require extensive testing and some more work, so the will be presented
later, in next iteration
or next patch - these are optimization of heap AM according
table_relation_fetch_toast_slice,
double-index problem and I'm continue to straighten out code related to
TOAST functionality.
It's quite a task because as I mentioned before, this core was scattered
over Heap AM and
reference implementation of TOAST is very tightly intertwined with Heap AM
itself. Default
toaster uses Heap AM storage so it is unlikely that it will be possible to
fully detach it from
Heap.

However, I've made some more refactoring, removed empty sources, corrected
code according
to naming conventions, and extended README.toastapi document.

0002_toaster_interface_v10 contains TOAST API with Dummy toaster as an
example (but I would,
as recommended, remove Dummy toaster and provide it as an extension), and
default Toaster
was left as-is (reference implementation).

0003_toaster_default_v9 implements reference TOAST as Default Toaster via
TOAST API,
so Heap AM calls Toast only via API, and does not have direct calls to
Toast functionality.

0004_toaster_snapshot_v8 continues refactoring and has some important
changes (added
into README.toastapi new part related TOAST API extensibility - the virtual
functions table).

Also, I'll provide documentation package corrected according to Matthias'
remarks later,
in the next patch set.

Please check attached patch set.
Also, GIT branch with this patch resides here:
https://github.com/postgrespro/postgres/tree/toasterapi_clean

Thank all reviewers for feedback!

On Sat, Jul 23, 2022 at 10:15 AM Nikita Malakhov <hukutoc(at)gmail(dot)com> wrote:

> Hi hackers!
>
> Matthias, thank you very much for your feedback!
> Sorry, I forgot to attach files.
> Attaching here, but they are for the commit tagged "15beta2", I am
> currently
> rebasing this branch onto the actual master and will provide rebased
> version,
> with some corrections according to your feedback, in a day or two.
>
> >Indexes cannot index toasted values, so why would the toaster oid be
> >interesting for index storage properties?
>
> Here Teodor might correct me. Toast tables are indexed, and Heap TOAST
> mechanics accesses Toasted tuples by index, isn't it the case?
>
> >Moving toasters seems quite expensive when compared to just index
> >checks. When you have many toasters, but only a few hot ones, this
> >currently will move the "cold" toasters around a lot. Why not use a
> >stack instead (or alternatively, a 'zipper' (or similar) data
> >structure), where the hottest toaster is on top, so that we avoid
> >larger memmove calls?
>
> This is a reasonable remark, we'll consider it for the next iteration. Our
> reason
> is that we think there won't be a lot of custom Toasters, most likely less
> then
> a dozen, for the most complex/heavy datatypes so we haven't considered
> these expenses.
>
> >To the best of my knowledge varlena-pointers are unaligned; and this
> >is affirmed by the comment immediately under varattrib_1b_e. Assuming
> >alignment to 16 bits should/would be incorrect in some of the cases.
> >This is handled for normal varatt_external values by memcpy-ing the
> >value into local (aligned) fields before using them, but that doesn't
> >seem to be the case for varatt_custom?
>
> Alignment correction seemed reasonable for us because structures are
> anyway aligned in memory, so when we use 1 and 2-byte fields along
> with 4-byte it may create a lot of padding. Am I wrong? Also, correct
> alignment somewhat simplifies access to structure fields.
>
> >0003: (re-implement default toaster using toaster interface)
> >I see significant changes to the dummy toaster (created in 0002),
> >could those be moved to patch 0002 in the next iteration?
> Will do.
>
> >detoast.c and detoast.h are effectively empty after this patch (only
> >imports and commented-out code remain), please fully remove them
> >instead - that saves on patch diff size.
> Will do.
>
> About the location of toast_ functions: these functions are part of Heap
> TOAST mechanics, and they were scattered among other Heap internals
> sources. I've tried to gather them and put them in more straight order,
> but
> this work is not fully finished yet and will take some time. Working on it.
>
> I'll check if table_relation_fetch_toast_slice could be removed, thanks for
> the remark.
>
> toast_helper - done, will be provided in rebased version.
>
> toast_internals - this one is an internal part of TOAST implemented in
> Heap AM, but I'll try to straighten it out as much as I could.
>
> naming conventions in some sources - done, will be provided in rebased
> patch set.
>
> >Shouldn't the creation of toast tables be delegated to the toaster?
>
> Yes, you're right, and actually, it is. I'll check that and correct in
> rebased
> version.
>
> >Although this is code being moved around, the comment is demonstrably
> >false: A cancelled REINDEX CONCURRENTLY with a subsequent REINDEX can
> >leave a toast relation with 2 valid indexes.
>
> This code is quite old, we've not changed it but thanks for the remark,
> I'll check it more carefully.
>
> Small fixes are already merged into larger patches in attached files. Also,
> I appreciate your feedback on documentation - if you would have an
> opportunity
> please check README provided in 0003. I've took your comments on
> documentation
> into account and will include corrections according to them into rebased
> patch.
>
> As Aleksander recommended, I've shortened the patch set and left only the
> most
> important part - API and re-implemented default Toast. All bells and
> whistles are not
> of so much importance and could be sent later after the API itself will be
> straightened
> out and commited.
>
> Thank you very much!
>
> --
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Attachment Content-Type Size
0004_toaster_snapshot_v8.patch.gz application/x-gzip 9.2 KB
0003_toaster_default_v9.patch.gz application/x-gzip 34.0 KB
0002_toaster_interface_v10.patch.gz application/x-gzip 44.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sehrope Sarkuni 2022-07-25 21:22:24 Re: Proposal to provide the facility to set binary format output for specific OID's per session
Previous Message Tom Lane 2022-07-25 20:43:01 Re: log_line_prefix: make it possible to add the search_path