Optimize external TOAST storage

From: davinder singh <davindersingh2692(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Optimize external TOAST storage
Date: 2022-02-28 09:52:11
Message-ID: CAHzhFSEQ8Vi8pEWeZERPajCbOfvvae2Si5imxftEGhXp6px5iQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

For Toast storage [1] in PostgreSQL, first, the attribute value is
compressed
and then divided into chunks. The problem with storing compressed value is,
if we
are not saving enough space such that it reduces the #chunks then we end up
adding extra decompression cost on every read.
Based on the discussion with Robert and Dilip, we tried to optimize this
process. The idea is to TOAST the compressed value only if we are saving at
least
1 chunk(2KB default) of disk storage else use the uncompressed one. In this
way,
we will save decompression costs without losing much on storage.

In our tests, we have observed improvement in the read performance up to
28% by
giving up at most TOAST_MAX_CHUNK_SIZE (2KB) bytes for storage on disk. The
gain is more on large attributes (size > 4KB) because
compression/decompression
cost increases with size.
However, We have found, this assumption is not true when the data
compression
ratio is more and the size is barely big enough to cause TOASTing. For
example,
in the following test 4. b, we didn't get any performance advantage but the
table
size grew by 42% by storing uncompressed values.
Test Setup.

Create table t1_lz4 ( a text compression lz4, b text compression lz4);
-- Generate random data
create or replace function generate_att_data(len_info int)
returns text
language plpgsql
as
$$
declare
value text;
begin
select array_agg(md5(g::text))
into value
from generate_series(1, round(len_info/33)::int) g;
return value;
end;
$$;

--Test
Select b from t1_lz4;

Test 1:
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(364), repeat
(generate_att_data(1980), 2);
Summary:
Attribute size: original: 7925 bytes, after compression: 7845 bytes
Time for select: head: 42 sec, patch: 37 sec, *Performance Gain: 11%*
table size: Head 1662 MB, Patch: 1662 MB

Test 2:
Data: rows 100000
insert into t1_lz4(a, b) select generate_att_data(364),
generate_att_data(16505);
Summary:
Attribute size: original: 16505 bytes, after compression: 16050 bytes
Time for select: head: 35.6 sec, patch: 30 sec, *Performance Gain: 14%*
table size: Head 1636 MB, Patch: 1688 MB

Test 3:
Data: rows 50000
insert into t1_lz4(a, b) select generate_att_data(364),
generate_att_data(31685);
Summary:
Attribute size: original: 31685 bytes, after compression: 30263 bytes
Time for select: head: 35.4 sec, patch: 25.5 sec, *Performance Gain: 28%*
table size: Head 1601 MB, Patch: 1601 MB

Test 4.a:
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(11), repeat ('b', 250)
|| generate_att_data(3885);
Summary:
Attribute size: original: 3885 bytes, after compression: 3645 bytes
Time for select: head: 28 sec, patch: 26 sec, *Performance Gain: 7%*
*table size: Head 872 MB, Patch: 872 MB*

Test 4.b (High compression):
Data: rows 200000
insert into t1_lz4(a, b) select generate_att_data(364), repeat
(generate_att_data(1980), 2);
Summary:
Attribute size: original: 3966 bytes, after compression: 2012 bytes
Time for select: head: 27 sec, patch: 26 sec, *Performance Gain: 0%*
*table size: Head 612 MB, Patch: 872 MB*

This is the worst case for this optimization because of the following 2
reasons.
First, the table size would grow by 50% when compressed size is half of the
original size and
yet barely large enough for TOASTing. Table size can't grow more than that
because If
compression reduces the size even more then it will reduce the #chunks as
well and it stores
the compressed value in the table.

Second, not much gain in performance because of the small attribute size,
more attributes fit

in page (almost twice), on each page access it can access twice the number
of rows. And

also small value means low compression/decompression costs.

We have avoided such cases by applying the optimization when attribute size
> 4KB.

Regards,
Davinder
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v1_0001_optimize_external_toast_storage.patch application/octet-stream 5.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2022-02-28 09:53:38 Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit
Previous Message Peter Eisentraut 2022-02-28 09:24:16 Re: Add id's to various elements in protocol.sgml