From: | George Papadrosou <gpapadrosou(at)gmail(dot)com> |
---|---|
To: | Stephen Frost <sfrost(at)snowman(dot)net> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: GSOC - TOAST'ing in slices |
Date: | 2017-03-16 17:19:54 |
Message-ID: | A9D27575-54EF-41D5-A0E9-036A679515BD@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello all,
thank you for your replies. I agree with Alexander Korotkov that it is important to have a quality patch at the end of the summer.
Stephen, you mentioned PostGIS, but the conversation seems to lean towards JSONB. What are your thoughts?
Also, if I am to include some ideas/approaches in the proposal, it seems I should really focus on understanding how a specific data type is used, queried and indexed, which is a lot of exploring for a newcomer in postgres code.
In the meanwhile, I am trying to find how jsonb is indexed and queried. After I grasp the current situation I will be to think about new approaches.
Regards,
George
> On 15 Μαρ 2017, at 15:53, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Robert Haas <robertmhaas(at)gmail(dot)com <mailto:robertmhaas(at)gmail(dot)com>> writes:
>> On Tue, Mar 14, 2017 at 10:03 PM, George Papadrosou
>> <gpapadrosou(at)gmail(dot)com> wrote:
>>> The project’s idea is implement different slicing approaches according to
>>> the value’s datatype. For example a text field could be split upon character
>>> boundaries while a JSON document would be split in a way that allows fast
>>> access to it’s keys or values.
>
>> Hmm. So if you had a long text field containing multibyte characters,
>> and you split it after, say, every 1024 characters rather than after
>> every N bytes, then you could do substr() without detoasting the whole
>> field. On the other hand, my guess is that you'd waste a fair amount
>> of space in the TOAST table, because it's unlikely that the chunks
>> would be exactly the right size to fill every page of the table
>> completely. On balance it seems like you'd be worse off, because
>> substr() probably isn't all that common an operation.
>
> Keep in mind also that slicing on "interesting" boundaries rather than
> with the current procrustean-bed approach could save you at most one or
> two chunk fetches per access. So the upside seems limited. Moreover,
> how are you going to know whether a given toast item has been stored
> according to your newfangled approach? I doubt we're going to accept
> forcing a dump/reload for this.
>
> IMO, the real problem here is to be able to predict which chunk(s) to
> fetch at all, and I'd suggest focusing on that part of the problem rather
> than changes to physical storage. It's hard to see how to do anything
> very smart for text (except in the single-byte-encoding case, which is
> already solved). But the JSONB format was designed with some thought
> to this issue, so you might be able to get some traction there.
>
> regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | David Steele | 2017-03-16 17:25:53 | Re: [POC] A better way to expand hash indexes. |
Previous Message | Corey Huinker | 2017-03-16 17:03:21 | Re: asynchronous execution |