Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2021-07-27 06:06:24
Message-ID: b3406b45caa85f4b0ee5c18625f3250c@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Masahiko Sawada писал 2021-07-27 07:06:
> On Mon, Jul 26, 2021 at 11:01 PM Masahiko Sawada
> <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>
>> I'll experiment with the proposed ideas including this idea in more
>> scenarios and share the results tomorrow.
>>
>
> I've done some benchmarks for proposed data structures. In this trial,
> I've done with the scenario where dead tuples are concentrated on a
> particular range of table blocks (test 5-8), in addition to the
> scenarios I've done in the previous trial. Also, I've done benchmarks
> of each scenario while increasing table size. In the first test, the
> maximum block number of the table is 1,000,000 (i.g., 8GB table) and
> in the second test, it's 10,000,000 (80GB table). We can see how
> performance and memory consumption changes with a large-scale table.
> Here are the results:
>
> * Test 1
> select prepare(
> 1000000, -- max block
> 10, -- # of dead tuples per page
> 1, -- dead tuples interval within a page
> 1, -- # of consecutive pages having dead tuples
> 20 -- page interval
> );
>
> name | attach | attach | shuffled | size_x10 | attach_x10|
> shuffled_x10
> --------+-----------+--------+----------+------------+-----------+-------------
> array | 57.23 MB | 0.040 | 98.613 | 572.21 MB | 0.387 |
> 1521.981
> intset | 46.88 MB | 0.114 | 75.944 | 468.67 MB | 0.961 |
> 997.760
> radix | 40.26 MB | 0.102 | 18.427 | 336.64 MB | 0.797 |
> 266.146
> rtbm | 64.02 MB | 0.234 | 22.443 | 512.02 MB | 2.230 |
> 275.143
> svtm | 27.28 MB | 0.060 | 13.568 | 274.07 MB | 0.476 |
> 211.073
> tbm | 96.01 MB | 0.273 | 10.347 | 768.01 MB | 2.882 |
> 128.103
>
> * Test 2
> select prepare(
> 1000000, -- max block
> 10, -- # of dead tuples per page
> 1, -- dead tuples interval within a page
> 1, -- # of consecutive pages having dead tuples
> 1 -- page interval
> );
>
> name | attach | attach | shuffled | size_x10 | attach_x10|
> shuffled_x10
> --------+-----------+--------+----------+------------+-----------+-------------
> array | 57.23 MB | 0.041 | 4.757 | 572.21 MB | 0.344 |
> 71.228
> intset | 46.88 MB | 0.127 | 3.762 | 468.67 MB | 1.093 |
> 49.573
> radix | 9.95 MB | 0.048 | 0.679 | 82.57 MB | 0.371 |
> 16.211
> rtbm | 34.02 MB | 0.179 | 0.534 | 288.02 MB | 2.092 |
> 8.693
> svtm | 5.78 MB | 0.043 | 0.239 | 54.60 MB | 0.342 |
> 7.759
> tbm | 96.01 MB | 0.274 | 0.521 | 768.01 MB | 2.685 |
> 6.360
>
> * Test 3
> select prepare(
> 1000000, -- max block
> 2, -- # of dead tuples per page
> 100, -- dead tuples interval within a page
> 1, -- # of consecutive pages having dead tuples
> 1 -- page interval
> );
>
> name | attach | attach | shuffled | size_x10 | attach_x10|
> shuffled_x10
> --------+-----------+--------+----------+------------+-----------+-------------
> array | 11.45 MB | 0.009 | 57.698 | 114.45 MB | 0.076 |
> 1045.639
> intset | 15.63 MB | 0.031 | 46.083 | 156.23 MB | 0.243 |
> 848.525
> radix | 40.26 MB | 0.063 | 13.755 | 336.64 MB | 0.501 |
> 223.413
> rtbm | 36.02 MB | 0.123 | 11.527 | 320.02 MB | 1.843 |
> 180.977
> svtm | 9.28 MB | 0.053 | 9.631 | 92.59 MB | 0.438 |
> 212.626
> tbm | 96.01 MB | 0.228 | 10.381 | 768.01 MB | 2.258 |
> 126.630
>
> * Test 4
> select prepare(
> 1000000, -- max block
> 100, -- # of dead tuples per page
> 1, -- dead tuples interval within a page
> 1, -- # of consecutive pages having dead tuples
> 1 -- page interval
> );
>
> name | attach | attach | shuffled | size_x10 | attach_x10|
> shuffled_x10
> --------+-----------+--------+----------+------------+-----------+-------------
> array | 572.21 MB | 0.367 | 78.047 | 5722.05 MB | 3.942 |
> 1154.776
> intset | 93.74 MB | 0.777 | 45.146 | 937.34 MB | 7.716 |
> 643.708
> radix | 40.26 MB | 0.203 | 9.015 | 336.64 MB | 1.775 |
> 133.294
> rtbm | 36.02 MB | 0.369 | 5.639 | 320.02 MB | 3.823 |
> 88.832
> svtm | 7.28 MB | 0.294 | 3.891 | 73.60 MB | 2.690 |
> 103.744
> tbm | 96.01 MB | 0.534 | 5.223 | 768.01 MB | 5.679 |
> 60.632
>
>
> * Test 5
> select prepare(
> 1000000, -- max block
> 150, -- # of dead tuples per page
> 1, -- dead tuples interval within a page
> 10000, -- # of consecutive pages having dead tuples
> 20000 -- page interval
> );
>
> There are 10000 consecutive pages that have 150 dead tuples at every
> 20000 pages.
>
> name | attach | attach | shuffled | size_x10 | attach_x10|
> shuffled_x10
> --------+-----------+--------+----------+------------+-----------+-------------
> array | 429.16 MB | 0.274 | 75.664 | 4291.54 MB | 3.067 |
> 1259.501
> intset | 46.88 MB | 0.559 | 36.449 | 468.67 MB | 4.565 |
> 517.445
> radix | 20.26 MB | 0.166 | 8.466 | 196.90 MB | 1.273 |
> 166.587
> rtbm | 18.02 MB | 0.242 | 8.491 | 160.02 MB | 2.407 |
> 171.725
> svtm | 3.66 MB | 0.243 | 3.635 | 37.10 MB | 2.022 |
> 86.165
> tbm | 48.01 MB | 0.344 | 9.763 | 384.01 MB | 3.327 |
> 151.824
>
> * Test 6
> select prepare(
> 1000000, -- max block
> 10, -- # of dead tuples per page
> 1, -- dead tuples interval within a page
> 10000, -- # of consecutive pages having dead tuples
> 20000 -- page interval
> );
>
> There are 10000 consecutive pages that have 10 dead tuples at every
> 20000 pages.
>
> name | attach | attach | shuffled | size_x10 | attach_x10|
> shuffled_x10
> --------+-----------+--------+----------+------------+-----------+-------------
> array | 28.62 MB | 0.022 | 2.791 | 286.11 MB | 0.170 |
> 46.920
> intset | 23.45 MB | 0.061 | 2.156 | 234.34 MB | 0.501 |
> 32.577
> radix | 5.04 MB | 0.026 | 0.433 | 48.57 MB | 0.191 |
> 11.060
> rtbm | 17.02 MB | 0.074 | 0.533 | 144.02 MB | 0.954 |
> 11.502
> svtm | 3.16 MB | 0.023 | 0.206 | 27.60 MB | 0.175 |
> 4.886
> tbm | 48.01 MB | 0.132 | 0.656 | 384.01 MB | 1.284 |
> 10.231
>
> * Test 7
> select prepare(
> 1000000, -- max block
> 150, -- # of dead tuples per page
> 1, -- dead tuples interval within a page
> 1000, -- # of consecutive pages having dead tuples
> 999000 -- page interval
> );
>
> There are pages that have 150 dead tuples at first 1000 blocks and
> last 1000 blocks.
>
> name | attach | attach | shuffled | size_x10 | attach_x10|
> shuffled_x10
> --------+-----------+--------+----------+------------+-----------+-------------
> array | 1.72 MB | 0.002 | 7.507 | 17.17 MB | 0.011 |
> 76.510
> intset | 0.20 MB | 0.003 | 6.742 | 1.89 MB | 0.022 |
> 52.122
> radix | 0.20 MB | 0.001 | 1.023 | 1.07 MB | 0.007 |
> 12.023
> rtbm | 0.15 MB | 0.001 | 2.637 | 0.65 MB | 0.009 |
> 34.528
> svtm | 0.52 MB | 0.002 | 0.721 | 0.61 MB | 0.010 |
> 6.434
> tbm | 0.20 MB | 0.002 | 2.733 | 1.51 MB | 0.015 |
> 38.538
>
> * Test 8
> select prepare(
> 1000000, -- max block
> 100, -- # of dead tuples per page
> 1, -- dead tuples interval within a page
> 50, -- # of consecutive pages having dead tuples
> 100 -- page interval
> );
>
> There are 50 consecutive pages that have 100 dead tuples at every 100
> pages.
>
> name | attach | attach | shuffled | size_x10 | attach_x10|
> shuffled_x10
> --------+-----------+--------+----------+------------+-----------+-------------
> array | 286.11 MB | 0.184 | 67.233 | 2861.03 MB | 1.743 |
> 979.070
> intset | 46.88 MB | 0.389 | 35.176 | 468.67 MB | 3.698 |
> 505.322
> radix | 21.82 MB | 0.116 | 6.160 | 186.86 MB | 0.891 |
> 117.730
> rtbm | 18.02 MB | 0.182 | 5.909 | 160.02 MB | 1.870 |
> 112.550
> svtm | 4.28 MB | 0.152 | 3.213 | 37.60 MB | 1.383 |
> 79.073
> tbm | 48.01 MB | 0.265 | 6.673 | 384.01 MB | 2.586 |
> 101.327
>
> Overall, 'svtm' is faster and consumes less memory. 'radix' tree also
> has good performance and memory usage.
>
> From these results, svtm is the best data structure among proposed
> ideas for dead tuple storage used during lazy vacuum in terms of
> performance and memory usage. I think it can support iteration by
> extracting the offset of dead tuples for each block while iterating
> chunks.
>
> Apart from performance and memory usage points of view, we also need
> to consider the reusability of the code. When I started this thread, I
> thought the best data structure would be the one optimized for
> vacuum's dead tuple storage. However, if we can use a data structure
> that can also be used in general, we can use it also for other
> purposes. Moreover, if it's too optimized for the current TID system
> (32 bits block number, 16 bits offset number, maximum block/offset
> number, etc.) it may become a blocker for future changes.
>
> In that sense, radix tree also seems good since it can also be used in
> gist vacuum as a replacement for intset, or a replacement for hash
> table for shared buffer as discussed before. Are there any other use
> cases? On the other hand, I’m concerned that radix tree would be an
> over-engineering in terms of vacuum's dead tuples storage since the
> dead tuple storage is static data and requires only lookup operation,
> so if we want to use radix tree as dead tuple storage, I'd like to see
> further use cases.

I can evolve svtm to transparent intset replacement certainly. Using
same trick from radix_to_key it will store tids efficiently:

shift = pg_ceil_log2_32(MaxHeapTuplesPerPage);
tid_i = ItemPointerGetOffsetNumber(tid);
tid_i |= ItemPointerGetBlockNumber(tid) << shift;

Will do today's evening.

regards
Yura Sokolov aka funny_falcon

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-07-27 06:09:51 Re: [HACKERS] logical decoding of two-phase transactions
Previous Message Bharath Rupireddy 2021-07-27 06:06:05 Re: Inaccurate error message when set fdw batch_size to 0