Re: Optimize external TOAST storage

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: davinder singh <davindersingh2692(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optimize external TOAST storage
Date: 2022-03-08 16:14:41
Message-ID: CAFiTN-vDda8RnakbHK5z9LNWQ3buKzOhRMXsZEcNqLuQCGdQAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 28, 2022 at 3:22 PM davinder singh
<davindersingh2692(at)gmail(dot)com> wrote:
>
> Hi,
>
> For Toast storage [1] in PostgreSQL, first, the attribute value is compressed
> and then divided into chunks. The problem with storing compressed value is, if we
> are not saving enough space such that it reduces the #chunks then we end up
> adding extra decompression cost on every read.
> Based on the discussion with Robert and Dilip, we tried to optimize this
> process. The idea is to TOAST the compressed value only if we are saving at least
> 1 chunk(2KB default) of disk storage else use the uncompressed one. In this way,
> we will save decompression costs without losing much on storage.
>
> In our tests, we have observed improvement in the read performance up to 28% by
> giving up at most TOAST_MAX_CHUNK_SIZE (2KB) bytes for storage on disk. The
> gain is more on large attributes (size > 4KB) because compression/decompression
> cost increases with size.
> However, We have found, this assumption is not true when the data compression
> ratio is more and the size is barely big enough to cause TOASTing. For example,
> in the following test 4. b, we didn't get any performance advantage but the table
> size grew by 42% by storing uncompressed values.
> Test Setup.
>
> Create table t1_lz4 ( a text compression lz4, b text compression lz4);
> -- Generate random data
> create or replace function generate_att_data(len_info int)
> returns text
> language plpgsql
> as
> $$
> declare
> value text;
> begin
> select array_agg(md5(g::text))
> into value
> from generate_series(1, round(len_info/33)::int) g;
> return value;
> end;
> $$;
>
> --Test
> Select b from t1_lz4;
>
> Test 1:
> Data: rows 200000
> insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
> Summary:
> Attribute size: original: 7925 bytes, after compression: 7845 bytes
> Time for select: head: 42 sec, patch: 37 sec, Performance Gain: 11%
> table size: Head 1662 MB, Patch: 1662 MB
>
> Test 2:
> Data: rows 100000
> insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(16505);
> Summary:
> Attribute size: original: 16505 bytes, after compression: 16050 bytes
> Time for select: head: 35.6 sec, patch: 30 sec, Performance Gain: 14%
> table size: Head 1636 MB, Patch: 1688 MB
>
> Test 3:
> Data: rows 50000
> insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(31685);
> Summary:
> Attribute size: original: 31685 bytes, after compression: 30263 bytes
> Time for select: head: 35.4 sec, patch: 25.5 sec, Performance Gain: 28%
> table size: Head 1601 MB, Patch: 1601 MB
>
> Test 4.a:
> Data: rows 200000
> insert into t1_lz4(a, b) select generate_att_data(11), repeat ('b', 250) || generate_att_data(3885);
> Summary:
> Attribute size: original: 3885 bytes, after compression: 3645 bytes
> Time for select: head: 28 sec, patch: 26 sec, Performance Gain: 7%
> table size: Head 872 MB, Patch: 872 MB
>
> Test 4.b (High compression):
> Data: rows 200000
> insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
> Summary:
> Attribute size: original: 3966 bytes, after compression: 2012 bytes
> Time for select: head: 27 sec, patch: 26 sec, Performance Gain: 0%
> table size: Head 612 MB, Patch: 872 MB

I think the idea looks interesting and the results are also promising.
I have a few initial comments, later I will do more detailed review.

1.
+ if (*value == orig_toast_value)
+ toast_tuple_externalize(ttc, attribute, options);

Isn't it looks cleaner to check whther the attribute is compressed or
not like this
VARATT_IS_COMPRESSED(DatumGetPointer(*value)) ?

2.
+ /* Using the uncompressed data instead, deleting
compressed data. */
+ pfree(DatumGetPointer(*value));

change as below
/deleting compressed data/release memory for the compressed data

3.
+ memcpy(toast_attr_copy, toast_attr, sizeof(toast_attr));
+ memcpy(toast_values_copy, toast_values, sizeof(toast_values));
+

Add some comment here, what we are trying to copy and why?

4.
+ /* incompressible, ignore on subsequent compression passes. */
+ orig_attr->tai_colflags |= TOASTCOL_INCOMPRESSIBLE;

Do we need to set TOASTCOL_INCOMPRESSIBLE while trying to externalize
it, the comment say "ignore on subsequent compression passes"
but after this will there be more compression passes? If we need to
set this TOASTCOL_INCOMPRESSIBLE then comment should explain this.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-03-08 16:19:18 Re: Add pg_freespacemap extension sql test
Previous Message Imseih (AWS), Sami 2022-03-08 15:41:47 Re: Add index scan progress to pg_stat_progress_vacuum