Re: Pluggable toaster

From: Nikita Malakhov <hukutoc(at)gmail(dot)com>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jacob Champion <jchampion(at)timescale(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Pluggable toaster
Date: 2022-11-03 14:30:02
Message-ID: CAN-LCVN++kdZ5Z2CmuTzjNOGmHyjM-R24oTKrogHn_a=_h-tKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Setting TOAST for table and database is a subject for discussion. There is
already default
Toaster. Also, there is not much sense in setting Jsonb Toaster as
default even for table, do
not say database, because table could contain other TOASTable columns not
of Json type.

To be able to set custom Toaster as default for table you have to make it
work with ALL
TOASTable datatypes - which leads to lots and lots lines of code,
complexity and difficulties
supporting such custom Toaster. Custom Toasters are meant to be rather
small and have
specialty in some tricky datatypes or workflow.

Custom Toasters will work with Extended storage, but as I answered in
previous email -
there is no much use of it, because they would deal with compressed data.

>No, encryption is an excellent example of what a TOASTer should NOT
>do. If you are interested in encryption consider joining the "Moving
>forward with TDE" thread [2].

I'm not working with encryption, so maybe it is really out of scope
example. Anyway,
compression and dealing with data with known internal structure or some
special
requirements lile geometric data in PostGIS - for example, custom PostGIS
Toaster gives
considerable performance boost.

>But should we really distinguish INSERT and UPDATE cases on this API
>level? It seems to me that TableAM just inserts new tuples. It's
>TOASTers job to figure out whether similar values existed before and
>should or shouldn't be reused. Additionally a particular TOASTer can
>reuse old values between _different_ rows, potentially even from
>different tables. Another reason why in practice there is little use
>of knowing whether the data is INSERTed or UPDATEd.

For TOASTer you SHOULD distinguish insert and update operations, really.
Because for
TOASTed data these operations affect many tuples, and AM does know which of
them
were updated and which were not - that's very serious limitation of current
TOAST, and
TOAST mechanics shoud care about updating affected tuples only instead of
marking
whole record dead and inserting new one. This is also an argument for not
using EXTENDED
storage mode - because with compressed data you do not have such choice,
you should
drop the whole record.

Correctly implemented UPDATE for TOAST boosts performance and considerably
decreases size of TOAST tables along with WAL size. This is not a question,
an UPDATE
operation for TOASTed data is a must - consider updating 1 Gb TOASTed
record - with
current TOAST you would finish having 2 1 Gb records in a table, one of
them dead, and
2 Gb in WAL. With update you would have the same 1 Gb record and only
update diff in WAL.

>Users should be able to DROP extension. I seriously doubt that the
>patch is going to be accepted as long as it has this limitation.

There is a mention in documentation and previous discussion that this
operation would lead
to loss of data TOASTed with this custom Toaster. It was stated as an issue
and subject for
further duscucssion in previous emails.

>
> --
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-11-03 14:50:18 Re: Add explicit casts in four places to simplehash.h
Previous Message Aleksander Alekseev 2022-11-03 14:09:00 Re: Moving forward with TDE