Re: [HACKERS] Slicing TOAST

From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-students(at)postgresql(dot)org
Subject: Re: [HACKERS] Slicing TOAST
Date: 2013-05-14 07:50:50
Message-ID: 5191ECDA.3080204@krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-students

On 05/14/2013 10:05 AM, Simon Riggs wrote:
> I'm proposing this now as a possible GSoC project:
>
> In 1-byte character encodings (i.e. not UTF-8), SUBSTR() is optimised
> to allow seeking straight to the exact slice when retrieving a large
> toasted value. This reduces I/O considerably when you have large
> toasted values since it is an O(1) action rather than an O(N).
>
> This is possible because the slicing of toasted values is predictable
> on 1 byte encodings.
>
> It would be useful to have a predictable function perform the slicing,
> so we could use that knowledge later to optimise searches in a wider
> range of situations. More specifically, since UTF-8 is so common, to
> allow optimisations in that encoding of common data: text, XML, JSON.
>
> e.g. if we knew that an XML document has a required element called
> TITLE and that occurs only once and always in the first slice, it
> would be useful information to use in search functions. (Not sure, but
> it may be possible to assign non-consecutive slice numbers to allow
> variable data mid-way through a column value if needed).
>
> e.g. in UTF-8 free text we could put 500 characters in each slice, so
> that even if that could be anywhere between 500 and 2000 bytes it
> would still fit just fine.
>
> e.g. for arrays, if we put say 200 elements per slice, then accessing
> particular elements would require only 1 slice retrieval.
>
> Doing this would *possibly* reduce packing density, but not certainly
> so. But it would greatly improve access times to large structured
> toast values.
On the contrary, as it would enable us to pack the chunks fitting
more on the page, especially for :)

That is, first chunk into N bytes, then compress each chunk

-----------------
Hannu

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Marti Raudsepp 2013-05-14 08:17:13 Re: PostgreSQL 9.3 beta breaks some extensions "make install"
Previous Message Dimitri Fontaine 2013-05-14 07:29:38 Re: erroneous restore into pg_catalog schema

Browse pgsql-students by date

  From Date Subject
Next Message Thom Brown 2013-05-14 17:21:00 Re: Slicing TOAST
Previous Message Heikki Linnakangas 2013-05-14 07:06:07 Re: GSoC project: K-medoids clustering in Madlib