Re: GSOC - TOAST'ing in slices

From: George Papadrosou <gpapadrosou(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GSOC - TOAST'ing in slices
Date: 2017-03-16 17:19:54
Message-ID: A9D27575-54EF-41D5-A0E9-036A679515BD@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello all,

thank you for your replies. I agree with Alexander Korotkov that it is important to have a quality patch at the end of the summer.

Stephen, you mentioned PostGIS, but the conversation seems to lean towards JSONB. What are your thoughts?

Also, if I am to include some ideas/approaches in the proposal, it seems I should really focus on understanding how a specific data type is used, queried and indexed, which is a lot of exploring for a newcomer in postgres code.

In the meanwhile, I am trying to find how jsonb is indexed and queried. After I grasp the current situation I will be to think about new approaches.

Regards,
George

> On 15 Μαρ 2017, at 15:53, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Robert Haas <robertmhaas(at)gmail(dot)com <mailto:robertmhaas(at)gmail(dot)com>> writes:
>> On Tue, Mar 14, 2017 at 10:03 PM, George Papadrosou
>> <gpapadrosou(at)gmail(dot)com> wrote:
>>> The project’s idea is implement different slicing approaches according to
>>> the value’s datatype. For example a text field could be split upon character
>>> boundaries while a JSON document would be split in a way that allows fast
>>> access to it’s keys or values.
>
>> Hmm. So if you had a long text field containing multibyte characters,
>> and you split it after, say, every 1024 characters rather than after
>> every N bytes, then you could do substr() without detoasting the whole
>> field. On the other hand, my guess is that you'd waste a fair amount
>> of space in the TOAST table, because it's unlikely that the chunks
>> would be exactly the right size to fill every page of the table
>> completely. On balance it seems like you'd be worse off, because
>> substr() probably isn't all that common an operation.
>
> Keep in mind also that slicing on "interesting" boundaries rather than
> with the current procrustean-bed approach could save you at most one or
> two chunk fetches per access. So the upside seems limited. Moreover,
> how are you going to know whether a given toast item has been stored
> according to your newfangled approach? I doubt we're going to accept
> forcing a dump/reload for this.
>
> IMO, the real problem here is to be able to predict which chunk(s) to
> fetch at all, and I'd suggest focusing on that part of the problem rather
> than changes to physical storage. It's hard to see how to do anything
> very smart for text (except in the single-byte-encoding case, which is
> already solved). But the JSONB format was designed with some thought
> to this issue, so you might be able to get some traction there.
>
> regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2017-03-16 17:25:53 Re: [POC] A better way to expand hash indexes.
Previous Message Corey Huinker 2017-03-16 17:03:21 Re: asynchronous execution