Re: RFI: Extending the TOAST Pointer

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Nikita Malakhov <hukutoc(at)gmail(dot)com>
Subject: Re: RFI: Extending the TOAST Pointer
Date: 2023-05-22 13:07:18
Message-ID: CAEze2WgbmRQ43EuNUS5x5PZFL3SCpt-P+OY4=89daGqJkWHNvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 21 May 2023, 15:39 Aleksander Alekseev,
<aleksander(at)timescale(dot)com> wrote:
>
> Hi,
>
> > We'd need to stop using the va_tag as length indicator, but I don't
> > think it's currently assumed to be a length indicator anyway (see
> > VARSIZE_EXTERNAL(ptr)). By not using the varatt_external struct
> > currently in use, we could be able to get down to <18B toast pointers
> > as well, though I'd consider that unlikely.
>
> Agree.
>
> Another thing we have to decide is what to do exactly in the scope of
> this thread.
>
> I imagine it as a refactoring that will find all the places that deal
> with current TOAST pointer and changes them to something like:
>
> ```
> switch(va_tag) {
> case DEFAULT_VA_TAG( equals 18 ):
> default_toast_process_case_abc(...);
> default:
> elog(ERROR, "Unknown TOAST tag")
> }
> ```

I'm not sure that we need all that.
Many places do some special handling for VARATT_IS_EXTERNAL because
decompressing or detoasting is expensive and doing that as late as
possible can be beneficial (e.g. EXPLAIN ANALYZE can run much faster
because we never detoast returned columns). But only very few of these
cases actually work on explicitly on-disk data: my IDE can't find any
uses of VARATT_IS_EXTERNAL_ONDISK (i.e. the actual TOASTed value)
outside the expected locations of the toast subsystems, amcheck, and
logical decoding (incl. the pgoutput plugin). I'm fairly sure we only
need to update existing paths in those subsystems to support another
format of external (but not the current VARTAG_ONDISK) data.

> So that next time somebody is going to need another type of TOAST
> pointer this person will have only to add a corresponding tag and
> handlers. (Something like "virtual methods" will produce a cleaner
> code but will also break branch prediction, so I don't think we should
> use those.)

Yeah, I'm also not super stoked about using virtual methods for a new
external toast implementation.

> I don't think we need an example of adding a new TOAST tag in scope of
> this work since the default one is going to end up being such an
> example.
>
> Does it make sense?

I see your point, but I do think we should also think about why we do
the change.

E.g.: Our current toast infra is built around 4 uint32 fields in the
toast pointer; but with this change in place we can devise a new toast
pointer that uses varint encoding on the length-indicating fields to
reduce the footprint of 18B to an expected 14 bytes.

Kind regards,

Matthias van de Meent
Neon, Inc.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2023-05-22 13:19:48 Re: "38.10.10. Shared Memory and LWLocks" may require a clarification
Previous Message reid.thompson 2023-05-22 12:42:51 Re: Add the ability to limit the amount of memory that can be allocated to backends.