Seekable compressed TOAST [POC]

From: Sokolov Yura <funny(dot)falcon(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Seekable compressed TOAST [POC]
Date: 2017-05-22 10:41:40
Message-ID: dea9d6cc4cdabc609bede5ef6677adef@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Good day, everyone.

It is just proposal.
Main concern: allow compressed toast to be seekable.
Since every chunk compressed separately, toast_fetch_datum_slice can
fetch each slice separately as with EXTERNAL storage.

Attached patch is couple of new column storage types:
- EXTSEEKABLE - like external, but every chunk is separately compressed,
- SEEKABLE - mix of MAIN and EXTSEEKABLE, ie values less than 2k acts as
MAIN
storage, and greater as EXTSEEKABLE.

I tested it with source code of postgresql (tables with filename and
content)
EXTENDED storage: 1296k + 15032k = 16328k
EXTERNAL storage: 728k + 44552k = 45280k
EXTSEEKABLE: 728k + 23096k = 23824k
SEEKABLE: 768k + 23072k = 23640k

Patch is not complete: toast_pointer looks like uncompressed, so
toast_datum_size (and so that pg_column_size) reports uncompressed size
of datum.

And certainly it is just POC, cause better scheme could exist.

For example, improved approach could be:
- modify compression function, so it could stop when it produce desired
amount
of compressed data,
- instead of (oid, counter, chunk) use (oid, offset_in_uncompressed,
chunk)
for toast tuple, so that it could be located fast.
- using modified compression function, make chunks close to current 2k
limit
after compression, but compressed separately, and insert them with
offset
in uncompressed varlena.

Other improvement could be building dictionary common for all chunks,
and
storing it in chunk numbered -1.

PS. Interesting result with tsvector of source code:
EXTENDED:
896k + 16144k = 17040k
EXTERNAL:
896k + 16248k = 17144k
EXTSEEKABLE:
896k + 15792k = 16688k
SEEKABLE:
952k + 15752k = 16704k

So, a) looks like tsvector is almost uncompressible (so probably default
storage should be EXTERNAL), b) it is compressed better by chunks.

--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company

Attachment Content-Type Size
0001-attr-storage-SEEKABLE-and-EXTSEEKABLE.patch text/x-diff 13.3 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message tushar 2017-05-22 10:52:32 pg_dump ignoring information_schema tables which used in Create Publication.
Previous Message Pierre-Emmanuel André 2017-05-22 10:06:19 PostgreSQL 10beta1 / OpenBSD : compilation failed with libxml