Re: Pluggable toaster

From: Nikita Malakhov <hukutoc(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pluggable toaster
Date: 2022-01-17 22:25:56
Message-ID: CAN-LCVM27i5wPNfGmmzx9KNeSd+i9=tbJ1rQwpmxmv5L5VWsYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

>This sounds interesting, but very much like column compression, which
>was proposed some time ago. If we haven't made much progrees with that
>patch (AFAICS), what's the likelihood we'll succeed here, when it's
>combined with yet more complexity?
The main concern is that this patch provides open API for Toast
functionality
and new Toasters could be written as extensions and plugged in instead of
default one, for any column (or datatype). It was not possible before,
because
Toast functionality was part of the Postgres core code and was not meant to
be modified.

>Maybe doing that kind of compression in TOAST is somehow simpler, but I
>don't see it.
[Custom ]Toaster extension itself is not restricted to only toast data, it
could be
used for compression, encryption, etc, just name it - any case when data
meant
to be transformed in some complicated way before being stored and
(possibly)
transformed backwards while being selected from the table, along with some
simpler but not so obvious transformations like removing common parts shared
by all data in column before storing it and restoring column value to full
during
selection.

>Seems you'd need a mapping table, to allow M:N mapping between types and
>toasters, linking it to all "compatible" types. It's not clear to me how
>would this work with custom data types, domains etc.
Any suitable [custom] Toaster could be plugged in for any table column,
or [duscussible] for datatype and assigned by default to the according
column
for any table using this datatype.

>Also, what happens to existing values when you change the toaster? What
>if the toasters don't use the same access method to store the chunks
>(heap vs. btree)? And so on.
For newer data there is no problem - it will be toasted by the newly
assigned Toaster.
For detoasting - the Toaster ID is stored in Toast pointer and in principle
all data
could be detoasted by the according toaster if it is available. But this is
the topic
for discussion and we are open for proposals, because there are possible
cases
where older Toaster is not available - the older used Toaster extension is
not installed
at all or was uninstalled, it was upgraded to the newer version. Currently
we see
two ways of handling this case - to restrict changing the toaster, and to
re-toast
all toasted data which could be very heavy if Toaster is assigned to a
widely used
datatype, and we're looking forward to any ideas.

>So you suggest we move all of this to toaster? I'd say -1 to that,
>because it makes it much harder to e.g. add custom compression method, etc.
Original compression methods, etc. are not affected by this patch.

Regards,

--
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

On Mon, Jan 17, 2022 at 10:23 PM Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
wrote:

>
>
> On 1/14/22 19:41, Teodor Sigaev wrote:
> >
> >> In my understanding, we want to be able to
> >> 1. Access data from a toasted object one slice at a time, by using
> >> knowledge of the structure
> >> 2. If toasted data is updated, then update a minimum number of
> >> slices(s), without rewriting the existing slices
> >> 3. If toasted data is expanded, then allownew slices to be appended to
> >> the object without rewriting the existing slices
> >
> > There are more options:
> > 1 share common parts between not only versions of row but between all
> > rows in a column. Seems strange but examples:
> > - urls often have a common prefix and so storing in a prefix tree (as
> > SP-GiST does) allows significantly decrease storage size
> > - the same for json - it's often use case with common part of its
> > hierarchical structure
> > - one more usecase for json. If json use only a few schemes
> > (structure) it's possible to store in toast storage only values and
> > don't store keys and structure
>
> This sounds interesting, but very much like column compression, which
> was proposed some time ago. If we haven't made much progrees with that
> patch (AFAICS), what's the likelihood we'll succeed here, when it's
> combined with yet more complexity?
>
> Maybe doing that kind of compression in TOAST is somehow simpler, but I
> don't see it.
>
> > 2 Current toast storage stores chunks in heap accesses method and to
> > provide fast access by toast id it makes an index. Ideas:
> > - store chunks directly in btree tree, pgsql's btree already has an
> > INCLUDE columns, so, chunks and visibility data will be stored only
> > in leaf pages. Obviously it reduces number of disk's access for
> > "untoasting".
> > - use another access method for chunk storage
> >
>
> Maybe, but that probably requires more thought - e.g. btree requires the
> values to be less than 1/3 page, so I wonder how would that play with
> toasting of values.
>
> >> ISTM that we would want the toast algorithm to be associated with the
> >> datatype, not the column?
> >> Can you explain your thinking?
> > Hm. I'll try to explain my motivation.
> > 1) Datatype could have more than one suitable toasters. For different
> > usecases: fast retrieving, compact storage, fast update etc. As I
> > told above, for jsonb there are several optimal strategies for
> > toasting: for values with a few different structures, for close to
> > hierarchical structures, for values with different parts by access
> > mode (easy to imagine json with some keys used for search and some
> > keys only for output to user)
> > 2) Toaster could be designed to work with different data type. Suggested
> > appendable toaster is designed to work with bytea but could work with
> > text
> >
> > Looking on this point I have doubts where to store connection between
> > toaster and datatype. If we add toasteroid to pg_type how to deal with
> > several toaster for one datatype? (And we could want to has different
> > toaster on one table!) If we add typoid to pg_toaster then how it will
> > work with several datatypes? An idea to add a new many-to-many
> > connection table seems workable but here there are another questions,
> > such as will any toaster work with any table access method?
> >
> > To resolve this bundle of question we propose validate() method of
> > toaster, which should be called during DDL operation, i.e. toaster is
> > assigned to column or column's datatype is changed.
> >
>
> Seems you'd need a mapping table, to allow M:N mapping between types and
> toasters, linking it to all "compatible" types. It's not clear to me how
> would this work with custom data types, domains etc.
>
> Also, what happens to existing values when you change the toaster? What
> if the toasters don't use the same access method to store the chunks
> (heap vs. btree)? And so on.
>
> > More thought:
> > Now postgres has two options for column: storage and compression and now
> > we add toaster. For me it seems too redundantly. Seems, storage should
> > be binary value: inplace (plain as now) and toastable. All other
> > variation such as toast limit, compression enabling, compression kind
> > should be an per-column option for toaster (that's why we suggest valid
> > toaster oid for any column with varlena/toastable datatype). It looks
> > like a good abstraction but we will have a problem with backward
> > compatibility and I'm afraid I can't implement it very fast.
> >
>
> So you suggest we move all of this to toaster? I'd say -1 to that,
> because it makes it much harder to e.g. add custom compression method, etc.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-01-17 22:41:10 Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Previous Message Alvaro Herrera 2022-01-17 22:15:40 Re: a misbehavior of partition row movement (?)