Re: GSOC - TOAST'ing in slices

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: George Papadrosou <gpapadrosou(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GSOC - TOAST'ing in slices
Date: 2017-03-15 11:17:08
Message-ID: CA+TgmoZOTjzt4qCHQJNtvJ-EksUqdHNhT_WvEwaL5ZjpAyz3Eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 14, 2017 at 10:03 PM, George Papadrosou
<gpapadrosou(at)gmail(dot)com> wrote:
> The project’s idea is implement different slicing approaches according to
> the value’s datatype. For example a text field could be split upon character
> boundaries while a JSON document would be split in a way that allows fast
> access to it’s keys or values.

Hmm. So if you had a long text field containing multibyte characters,
and you split it after, say, every 1024 characters rather than after
every N bytes, then you could do substr() without detoasting the whole
field. On the other hand, my guess is that you'd waste a fair amount
of space in the TOAST table, because it's unlikely that the chunks
would be exactly the right size to fill every page of the table
completely. On balance it seems like you'd be worse off, because
substr() probably isn't all that common an operation.

Now, in contrast, slicing JSON is a very common operation, so a
smarter slicing scheme might well pay off, but the question is - what
kind of a splitting method would actually allow fast access to the
keys or values? It strikes me that this might be a difficult problem.
Tabula raza, you could design a serialization format that was aware
that it might get toasted and was constructed in such a way that as to
contain boundaries that are actually referenced from within the
format, so that, say, after reading the toplevel keys and values, you
could know that you next need chunk #103. But unless the existing
jsonb binary format was designed with that in mind, it doesn't seem
likely to end up being true just by chance.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Arthur Zakirov 2017-03-15 11:28:24 Re: IF NOT EXISTS option for CREATE SERVER and CREATE USER MAPPING statements
Previous Message Robert Haas 2017-03-15 11:05:40 Re: Remove obsolete text from hash/README