Re: Pluggable toaster

From: Nikita Malakhov <hukutoc(at)gmail(dot)com>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jacob Champion <jchampion(at)timescale(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Pluggable toaster
Date: 2022-10-24 11:16:07
Message-ID: CAN-LCVNvOTHJQSzJZjKyzshQdFeSNh8TSHy6p9PEUPL393ic7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

>I don't argue with most of what you say. I am just pointing out the
>reason why the chosen approach "N TOASTers x M TableAMs" will not
>work:

We assume that TAM used in custom Toaster works as it is should work,
and leave TAM internals to this TAM developer - say, we do not want to
change internals of Heap AM.

We don't want to create some kind of silver bullet. There are already
existing
and widely-known (from production environments) problems with TOAST
mechanics, and we suggest not too complex way to solve them.

As I mentioned before, Pluggable TOAST does not change Heap AM, it is
not minded to change TAMs.

>This is what I meant above when talking about the framework for
>simplifying this task:

That's a kind of generalizing custom TOAST implementation. It is very
good intention, but keep in mind that different kinds of data require very
different approach to external storage - say, JSON TOAST works with
maps of keys and values, super binary object (experimental name) does
not work with internals of TOASTed data except searching. But, we thought
about that too and reusable code resides in toast_internals.c source - any
custom Toaster working with Heap could use it's insert, update and fetch
methods, but deal with data on it's own.

Even with the general framework there must be a common interface which
would be the entry point for those custom methods developed with the
framework. That's what the TOAST API is - just an interface that all custom
TOAST implementations use to have a common entry point from any TAM,
with syntax support to plug in custom TOAST implementations from the SQL.
No less, but no more.

Moreover, our patches show that even Generic (default) TOAST implementation
could still be left as-is, without necessity to route it via our API,
though it is logically
wrong because common API is meant to be common for all TOAST implementations
without exceptions.

Have I answered your question? Please don't hesitate to point to any unclear
parts, I'd be glad to explain that.

The main idea in TOAST API is very elegant and light, and it is designed
alike
to Pluggable Storage (Table AM API).

On Mon, Oct 24, 2022 at 12:10 PM Aleksander Alekseev <
aleksander(at)timescale(dot)com> wrote:

> Hi Nikita,
>
> I don't argue with most of what you say. I am just pointing out the
> reason why the chosen approach "N TOASTers x M TableAMs" will not
> work:
>
> > Don't you think that this is an arguable design decision? Basically
> > all we know about the underlying TableAM is that it stores tuples
> > _somehow_ and that tuples have TIDs [1]. That's it. We don't know if
> > it even has any sort of pages, whether they are fixed in size or not,
> > whether it uses shared buffers, etc. It may not even require TOAST.
> > [...]
>
> Also I completely agree with:
>
> > Implementing another Table AM just to implement another TOAST strategy
> seems too
> > much, the TAM API is very heavy and complex, and you would have to add
> it as a contrib.
>
> This is what I meant above when talking about the framework for
> simplifying this task:
>
> > It looks like the idea should be actually turned inside out. I.e. what
> > would be nice to have is some sort of _framework_ that helps TableAM
> > authors to implement TOAST (alternatively, the rest of the TableAM
> > except for TOAST) if the TableAM is similar to the default one.
>
> From the user perspective it's much easier to think about one entity -
> TableAM, and choosing from heapam_with_default_toast and
> heapam_with_different_toast.
>
> From the extension implementer point of view creating TableAMs is a
> difficult task. This is what the framework should address. Ideally the
> interface should be as simple as:
>
> CreateParametrizedDefaultHeapAM(SomeTOASTSubstitutionObject, ...other
> arguments, in the future...)
>
> Where the extension author should be worried only about an alternative
> TOAST implementation.
>
> I think at some point such a framework may address at least one more
> issue we have - an inability to change the page size on the table
> level. As it was shown by Tomas Vondra [1] the default 8 KB page size
> can be suboptimal depending on the load. So it would be nice if the
> user could change it without rebuilding PostgreSQL. Naturally this is
> out of scope of this particular patchset. I just wanted to point out
> opportunities we have here.
>
> [1]:
> https://www.postgresql.org/message-id/flat/b4861449-6c54-ccf8-e67c-c039228cdc6d%40enterprisedb.com
>
> --
> Best regards,
> Aleksander Alekseev
>

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2022-10-24 11:29:10 Re: Testing DDL Deparser
Previous Message Melih Mutlu 2022-10-24 11:13:06 Re: Mingw task for Cirrus CI