Re: Pluggable toaster

From: Nikita Malakhov <hukutoc(at)gmail(dot)com>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jacob Champion <jchampion(at)timescale(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Pluggable toaster
Date: 2022-11-03 11:33:47
Message-ID: CAN-LCVNOFTyCKKqNSOKwdOt4+CDo+KEy+mvNf5zw_oWNMjXNvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

Thank you very much for the feedback.
Answering accordingly to questions/remarks order:

>I'm in the process of testing and reviewing the v23 patchset. So far I
>just wanted to report that there seems to be certain issues with the
>formatting and/or trailing whitespaces in the patches:

Accepted, will be done.

>There are also some compiler warnings, please see the attachment.

This too. There warnings showed up recently due to fresh changes in macros.

>1.1. Could you please clarify what is the motivation behind such a
>verbose syntax?

Toaster is set for the table column. Each TOASTable column could have
a different Toaster, so column option is the most obvious place to add it.

>1.2. That's odd. TOAST should work for EXTENDED and MAIN storage
>strategies as well. On top of that, why should custom TOASTers have
>any knowledge of the default four-stage algorithm and the storage
>strategies? If the storage strategy is actually ignored, it shouldn't
>be used in the syntax.

EXTENDED storage strategy means that TOASTed value is compressed
before being TOASTed, so no knowledge of its internals could be of any
use. EXTERNAL strategy means that value is being TOASTed in original
form. Storage strategy is the thing internal to AM used, and TOAST
mechanics is not meant to interfere with it. Again, STORAGE EXTERNAL
explicitly shows that value will be stored out-of-line.

>2. Although it's possible to implement some encryption in a TOASTer I
>don't think the documentation should advertise this.

It is a good example of what could the Toaster be responsible for, because
we proposed moving compression into Toasters as a TOAST option -
for example, being set while initializing the Toaster.

>3.1. I believe we should rename this to something like `struct
>ToastImpl`. The `Tsr` abbreviation only creates confusion, and this is
>not a routine.

It was done similar to Table AM Routine (please check Pluggable
Storage API), along with some other decisions.

>3.2. The names of the fields should be made consistent - e.g. if you
>have update_toast and copy_toast then del_toast should be renamed to
>delete_toast.
>3.2. Additionally, in some parts of the path del_toast is used, while
>in others - deltoast.

Accepted, will be fixed in the next rebase.

>4. The user documentation should have clear answers on the following
questions:
>4.1. What will happen if the user tries to delete a TOASTer while
>still having data that was TOASTed using this TOASTer? Or this is not
>supported and the TOASTers should exist in the database indefinitely
>after creation?
>4.2. Is it possible to delete and/or alter the default TOASTer?
>4.3. Please make sure the previously discussed clarification regarding
>"N TOASTers vs M TableAMs" and the validation function is explicitly
>present.

Thank you very much for checking the documentation package. These are
very good comments, I'll include these topics in the next patchset.

>5.1. The documentation should clarify how many times init() is called
>- is it done once for the TOASTer, once per relation, etc.
>5.2. Why there is no free() method?

These too. About free() method - Toasters are not meant to be deleted,
we mentioned this several times. They exist once they are created as long
as the DB itself. Have I answered your question?

>6. It's not clear for what reason update_toast() will be executed to
>begin with. This should be clarified in the documentation. How does it
>differ from copy_toast()?

It is not clear because current TOAST mechanics does not have UPDATE
functionality - it doesn't actually update TOASTed value, it marks this
value
"dead" and inserts a new one. This is the cause of TOAST tables bloating
that is being complained about by many users. Update method is provided
for implementation of UPDATE operation.

>7. It should be explicitly stated when validate() is called and what
>happens depending on the return value.

Accepted, will correct this topic.

>8. IMO this section does a very poor job in explaining what this is
>for and when I may want to use this; what particular problem are we
>addressing here?

I already answered this question, maybe the answer was not very clear.
This is just an extension entry point, because for some more advanced
functionality existing pre-defined set of methods would be not enough, i.e.
special Toasters for complex datatypes like JSONb, that have complex
internal structure and may require additional ways to interact with it along
toast/detoast/update/delete.

Also, I'll check README and documentation for typos.

On Thu, Nov 3, 2022 at 1:39 PM Aleksander Alekseev <aleksander(at)timescale(dot)com>
wrote:

> Hi Nikita,
>
> > I'm going to submit a more detailed code review soon.
>
> As promised, here is my feedback.
>
> ```
> + Toaster could be assigned to toastable column with expression
> + <literal>STORAGE external TOASTER
> <replaceable>toaster_name</replaceable></literal>
> ```
>
> 1.1. Could you please clarify what is the motivation behind such a
> verbose syntax?
>
> ```
> +Please note that custom Toasters could be assigned only to External
> +storage type.
> ```
>
> 1.2. That's odd. TOAST should work for EXTENDED and MAIN storage
> strategies as well. On top of that, why should custom TOASTers have
> any knowledge of the default four-stage algorithm and the storage
> strategies? If the storage strategy is actually ignored, it shouldn't
> be used in the syntax.
>
> ```
> +Toasters could use any storage, advanced compression, encryption, and
> ```
>
> 2. Although it's possible to implement some encryption in a TOASTer I
> don't think the documentation should advertise this.
>
> ```
> +typedef struct TsrRoutine
> +{
> + NodeTag type;
> +
> + /* interface functions */
> + toast_init init;
> + toast_function toast;
> + update_toast_function update_toast;
> + copy_toast_function copy_toast;
> + detoast_function detoast;
> + del_toast_function deltoast;
> + get_vtable_function get_vtable;
> + toastervalidate_function toastervalidate;
> +} TsrRoutine;
> ```
>
> 3.1. I believe we should rename this to something like `struct
> ToastImpl`. The `Tsr` abbreviation only creates confusion, and this is
> not a routine.
> 3.2. The names of the fields should be made consistent - e.g. if you
> have update_toast and copy_toast then del_toast should be renamed to
> delete_toast.
> 3.2. Additionally, in some parts of the path del_toast is used, while
> in others - deltoast.
>
> 4. The user documentation should have clear answers on the following
> questions:
> 4.1. What will happen if the user tries to delete a TOASTer while
> still having data that was TOASTed using this TOASTer? Or this is not
> supported and the TOASTers should exist in the database indefinitely
> after creation?
> 4.2. Is it possible to delete and/or alter the default TOASTer?
> 4.3. Please make sure the previously discussed clarification regarding
> "N TOASTers vs M TableAMs" and the validation function is explicitly
> present.
>
> ```
> Toaster initialization. Initial TOAST table creation and other startup
> preparations.
> ```
>
> 5.1. The documentation should clarify how many times init() is called
> - is it done once for the TOASTer, once per relation, etc.
> 5.2. Why there is no free() method?
>
> ```
> Updates TOASTed value. Returns new TOAST pointer.
> ```
>
> 6. It's not clear for what reason update_toast() will be executed to
> begin with. This should be clarified in the documentation. How does it
> differ from copy_toast()?
>
> ```
> Validates Toaster for given data type, storage and compression options.
> ```
>
> 7. It should be explicitly stated when validate() is called and what
> happens depending on the return value.
>
> ```
> +Virtual Functions Table API Extension
> ```
>
> 8. IMO this section does a very poor job in explaining what this is
> for and when I may want to use this; what particular problem are we
> addressing here?
>
> 9. There are typos in the comments and the documentation,
> s/Vaitual/Virtual/, s/vurtual/virtual/ etc. Also there are missing
> articles. Please run your patches through a spellchecker.
>
> I suggest we address this piece of feedback before proceeding further.
>
> --
> Best regards,
> Aleksander Alekseev
>

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Geier 2022-11-03 11:34:41 Add explicit casts in four places to simplehash.h
Previous Message Amit Kapila 2022-11-03 11:19:01 Re: confirmed_flush_lsn shows LSN of the data that has not yet been received by the logical subscriber.