Re: Pluggable toaster

From: Nikita Malakhov <hukutoc(at)gmail(dot)com>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jacob Champion <jchampion(at)timescale(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Pluggable toaster
Date: 2022-10-24 14:44:35
Message-ID: CAN-LCVPv-aTC5iH4=nx_q4rymkLt5RXWM=qP9smczmFiwMmwJw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

>From personal experience with the project I have serious doubts this
>is going to happen. Before such invasive changes are going to be
>accepted there should be a clear understanding of how exactly TOASTers
>are supposed to be used. This should be part of the documentation in
>the patchset. Additionally there should be an open-soruce or
>source-available extension that actually demonstrates the benefits of
>TOASTers with reproducible benchmarks (we didn't even get to that part
>yet).

Actually, there's a documentation part in the patchset. Also, there is
README file
explaining the API.
In addition, we have several custom TOAST implementations with some
results - they were published and presented on PgCon. I was asked to exclude
custom TOAST implementations and some further improvements for the first
iteration, that's why currently the patchset consists only of 3 patches -
base
core changes, default TOAST implementation via TOAST API and documentation
package.

>What other use cases for TOAST do you have in mind?

The main use case is the same as for the TOAST mechanism - storing and
retrieving
oversized data. But we expanded this case with some details -
- update TOASTed data (yes, current TOAST implementation cannot update
stored
data - is marks whole TOASTED object as dead and stores new one);
- retrieve part of the stored data chunks without fully de-TOASTing stored
data (even
with existing TOAST this will be painful if you have to get just a small
part of the several
hundreds Mb sized object);
- be able to store objects of size larger than 1 Gb;
- store more than 4 Tb of TOASTed data for one table;
- optimize storage for fast search and retrieval of parts of TOASTed object
- this is
must-have for effectively using JSON, PostgreSQL already is in catching-up
position
in JSON performance field.

For all this cases we have test results that show improvements in storage
and performance.

>To clarify, the concern about "N TOASTers vs M TableAM" was expressed
>by Robert Haas back in Jan 2022:

>> I agree ... but I'm also worried about what happens when we have
>> multiple table AMs. One can imagine a new table AM that is
>> specifically optimized for TOAST which can be used with an existing
>> heap table. One can imagine a new table AM for the main table that
>> wants to use something different for TOAST. So, I don't think it's
>> right to imagine that the choice of TOASTer depends solely on the
>> column data type. I'm not really sure how this should work exactly ...
>> but it needs careful thought.

>This is the most important open question so far to my knowledge. It
>was never addressed, it doesn't seem like there is a plan of doing so,
>the suggested alternative approach was ignored, nor are there any
>strong arguments that would defend this design choice and/or criticize
>the alternative one (other than general words "don't worry we know
>what we are doing").

>This what I mean by the community feedback being discarded.

Maybe there was some misunderstanding, I was new to this project and
company at that time - I was introduced to is in the middle of December
2021, but Theodor Sigaev gave an answer to Mr. Haas:

>Right. that's why we propose a validate method (may be, it's a wrong
>name, but I don't known better one) which accepts several arguments, one
>of which is table AM oid. If that method returns false then toaster
>isn't useful with current TAM, storage or/and compression kinds, etc.

And this is generalized and correct from the OOP POV mean to provide a
way to ensure that this concrete TOAST implementation is valid for Table AM
calling it.

On Mon, Oct 24, 2022 at 4:53 PM Aleksander Alekseev <
aleksander(at)timescale(dot)com> wrote:

> Hi Nikita,
>
> > Using Table AM Routine and routing AM methods calls via it is a topic
> for further discussion,
> > if Pluggable TOAST will be committed. [...] And even then it would be an
> open issue.
>
> From personal experience with the project I have serious doubts this
> is going to happen. Before such invasive changes are going to be
> accepted there should be a clear understanding of how exactly TOASTers
> are supposed to be used. This should be part of the documentation in
> the patchset. Additionally there should be an open-soruce or
> source-available extension that actually demonstrates the benefits of
> TOASTers with reproducible benchmarks (we didn't even get to that part
> yet).
>
> > TOAST implementation is not necessary for Table AM.
>
> What other use cases for TOAST do you have in mind?
>
> >> > Have I answered your question? Please don't hesitate to point to any
> unclear
> >> > parts, I'd be glad to explain that.
> >>
> >> No. To be honest, it looks like you are merely discarding most/any
> >> feedback the community provided so far.
> >>
> >> I really think that pluggable TOASTers would be a great feature.
> >> However if the goal is to get it into the core I doubt that we are
> >> going to make much progress with the current approach.
>
> To clarify, the concern about "N TOASTers vs M TableAM" was expressed
> by Robert Haas back in Jan 2022:
>
> > I agree ... but I'm also worried about what happens when we have
> > multiple table AMs. One can imagine a new table AM that is
> > specifically optimized for TOAST which can be used with an existing
> > heap table. One can imagine a new table AM for the main table that
> > wants to use something different for TOAST. So, I don't think it's
> > right to imagine that the choice of TOASTer depends solely on the
> > column data type. I'm not really sure how this should work exactly ...
> > but it needs careful thought.
>
> This is the most important open question so far to my knowledge. It
> was never addressed, it doesn't seem like there is a plan of doing so,
> the suggested alternative approach was ignored, nor are there any
> strong arguments that would defend this design choice and/or criticize
> the alternative one (other than general words "don't worry we know
> what we are doing").
>
> This what I mean by the community feedback being discarded.
>
> --
> Best regards,
> Aleksander Alekseev
>

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-10-24 14:56:24 Re: effective_multixact_freeze_max_age issue
Previous Message Finnerty, Jim 2022-10-24 14:32:02 Re: parse partition strategy string in gram.y