Re: GSOC - TOAST'ing in slices

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: George Papadrosou <gpapadrosou(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GSOC - TOAST'ing in slices
Date: 2017-03-15 13:53:59
Message-ID: 19235.1489586039@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Mar 14, 2017 at 10:03 PM, George Papadrosou
> <gpapadrosou(at)gmail(dot)com> wrote:
>> The project’s idea is implement different slicing approaches according to
>> the value’s datatype. For example a text field could be split upon character
>> boundaries while a JSON document would be split in a way that allows fast
>> access to it’s keys or values.

> Hmm. So if you had a long text field containing multibyte characters,
> and you split it after, say, every 1024 characters rather than after
> every N bytes, then you could do substr() without detoasting the whole
> field. On the other hand, my guess is that you'd waste a fair amount
> of space in the TOAST table, because it's unlikely that the chunks
> would be exactly the right size to fill every page of the table
> completely. On balance it seems like you'd be worse off, because
> substr() probably isn't all that common an operation.

Keep in mind also that slicing on "interesting" boundaries rather than
with the current procrustean-bed approach could save you at most one or
two chunk fetches per access. So the upside seems limited. Moreover,
how are you going to know whether a given toast item has been stored
according to your newfangled approach? I doubt we're going to accept
forcing a dump/reload for this.

IMO, the real problem here is to be able to predict which chunk(s) to
fetch at all, and I'd suggest focusing on that part of the problem rather
than changes to physical storage. It's hard to see how to do anything
very smart for text (except in the single-byte-encoding case, which is
already solved). But the JSONB format was designed with some thought
to this issue, so you might be able to get some traction there.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2017-03-15 13:56:19 Re: Allow pg_dumpall to work without pg_authid
Previous Message Amit Kapila 2017-03-15 13:50:08 Re: GUC for cleanup indexes threshold.