From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | davinder singh <davindersingh2692(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Optimize external TOAST storage |
Date: | 2022-03-08 16:14:41 |
Message-ID: | CAFiTN-vDda8RnakbHK5z9LNWQ3buKzOhRMXsZEcNqLuQCGdQAw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Feb 28, 2022 at 3:22 PM davinder singh
<davindersingh2692(at)gmail(dot)com> wrote:
>
> Hi,
>
> For Toast storage [1] in PostgreSQL, first, the attribute value is compressed
> and then divided into chunks. The problem with storing compressed value is, if we
> are not saving enough space such that it reduces the #chunks then we end up
> adding extra decompression cost on every read.
> Based on the discussion with Robert and Dilip, we tried to optimize this
> process. The idea is to TOAST the compressed value only if we are saving at least
> 1 chunk(2KB default) of disk storage else use the uncompressed one. In this way,
> we will save decompression costs without losing much on storage.
>
> In our tests, we have observed improvement in the read performance up to 28% by
> giving up at most TOAST_MAX_CHUNK_SIZE (2KB) bytes for storage on disk. The
> gain is more on large attributes (size > 4KB) because compression/decompression
> cost increases with size.
> However, We have found, this assumption is not true when the data compression
> ratio is more and the size is barely big enough to cause TOASTing. For example,
> in the following test 4. b, we didn't get any performance advantage but the table
> size grew by 42% by storing uncompressed values.
> Test Setup.
>
> Create table t1_lz4 ( a text compression lz4, b text compression lz4);
> -- Generate random data
> create or replace function generate_att_data(len_info int)
> returns text
> language plpgsql
> as
> $$
> declare
> value text;
> begin
> select array_agg(md5(g::text))
> into value
> from generate_series(1, round(len_info/33)::int) g;
> return value;
> end;
> $$;
>
> --Test
> Select b from t1_lz4;
>
> Test 1:
> Data: rows 200000
> insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
> Summary:
> Attribute size: original: 7925 bytes, after compression: 7845 bytes
> Time for select: head: 42 sec, patch: 37 sec, Performance Gain: 11%
> table size: Head 1662 MB, Patch: 1662 MB
>
> Test 2:
> Data: rows 100000
> insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(16505);
> Summary:
> Attribute size: original: 16505 bytes, after compression: 16050 bytes
> Time for select: head: 35.6 sec, patch: 30 sec, Performance Gain: 14%
> table size: Head 1636 MB, Patch: 1688 MB
>
> Test 3:
> Data: rows 50000
> insert into t1_lz4(a, b) select generate_att_data(364), generate_att_data(31685);
> Summary:
> Attribute size: original: 31685 bytes, after compression: 30263 bytes
> Time for select: head: 35.4 sec, patch: 25.5 sec, Performance Gain: 28%
> table size: Head 1601 MB, Patch: 1601 MB
>
> Test 4.a:
> Data: rows 200000
> insert into t1_lz4(a, b) select generate_att_data(11), repeat ('b', 250) || generate_att_data(3885);
> Summary:
> Attribute size: original: 3885 bytes, after compression: 3645 bytes
> Time for select: head: 28 sec, patch: 26 sec, Performance Gain: 7%
> table size: Head 872 MB, Patch: 872 MB
>
> Test 4.b (High compression):
> Data: rows 200000
> insert into t1_lz4(a, b) select generate_att_data(364), repeat (generate_att_data(1980), 2);
> Summary:
> Attribute size: original: 3966 bytes, after compression: 2012 bytes
> Time for select: head: 27 sec, patch: 26 sec, Performance Gain: 0%
> table size: Head 612 MB, Patch: 872 MB
I think the idea looks interesting and the results are also promising.
I have a few initial comments, later I will do more detailed review.
1.
+ if (*value == orig_toast_value)
+ toast_tuple_externalize(ttc, attribute, options);
Isn't it looks cleaner to check whther the attribute is compressed or
not like this
VARATT_IS_COMPRESSED(DatumGetPointer(*value)) ?
2.
+ /* Using the uncompressed data instead, deleting
compressed data. */
+ pfree(DatumGetPointer(*value));
change as below
/deleting compressed data/release memory for the compressed data
3.
+ memcpy(toast_attr_copy, toast_attr, sizeof(toast_attr));
+ memcpy(toast_values_copy, toast_values, sizeof(toast_values));
+
Add some comment here, what we are trying to copy and why?
4.
+ /* incompressible, ignore on subsequent compression passes. */
+ orig_attr->tai_colflags |= TOASTCOL_INCOMPRESSIBLE;
Do we need to set TOASTCOL_INCOMPRESSIBLE while trying to externalize
it, the comment say "ignore on subsequent compression passes"
but after this will there be more compression passes? If we need to
set this TOASTCOL_INCOMPRESSIBLE then comment should explain this.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2022-03-08 16:19:18 | Re: Add pg_freespacemap extension sql test |
Previous Message | Imseih (AWS), Sami | 2022-03-08 15:41:47 | Re: Add index scan progress to pg_stat_progress_vacuum |