Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2021-07-27 04:06:56
Message-ID: CAD21AoATJQM4PuQ46NvQAAuxUcATmb7udFyummFHPN-QhUcL5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 26, 2021 at 11:01 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> I'll experiment with the proposed ideas including this idea in more
> scenarios and share the results tomorrow.
>

I've done some benchmarks for proposed data structures. In this trial,
I've done with the scenario where dead tuples are concentrated on a
particular range of table blocks (test 5-8), in addition to the
scenarios I've done in the previous trial. Also, I've done benchmarks
of each scenario while increasing table size. In the first test, the
maximum block number of the table is 1,000,000 (i.g., 8GB table) and
in the second test, it's 10,000,000 (80GB table). We can see how
performance and memory consumption changes with a large-scale table.
Here are the results:

* Test 1
select prepare(
1000000, -- max block
10, -- # of dead tuples per page
1, -- dead tuples interval within a page
1, -- # of consecutive pages having dead tuples
20 -- page interval
);

name | attach | attach | shuffled | size_x10 | attach_x10| shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 57.23 MB | 0.040 | 98.613 | 572.21 MB | 0.387 | 1521.981
intset | 46.88 MB | 0.114 | 75.944 | 468.67 MB | 0.961 | 997.760
radix | 40.26 MB | 0.102 | 18.427 | 336.64 MB | 0.797 | 266.146
rtbm | 64.02 MB | 0.234 | 22.443 | 512.02 MB | 2.230 | 275.143
svtm | 27.28 MB | 0.060 | 13.568 | 274.07 MB | 0.476 | 211.073
tbm | 96.01 MB | 0.273 | 10.347 | 768.01 MB | 2.882 | 128.103

* Test 2
select prepare(
1000000, -- max block
10, -- # of dead tuples per page
1, -- dead tuples interval within a page
1, -- # of consecutive pages having dead tuples
1 -- page interval
);

name | attach | attach | shuffled | size_x10 | attach_x10| shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 57.23 MB | 0.041 | 4.757 | 572.21 MB | 0.344 | 71.228
intset | 46.88 MB | 0.127 | 3.762 | 468.67 MB | 1.093 | 49.573
radix | 9.95 MB | 0.048 | 0.679 | 82.57 MB | 0.371 | 16.211
rtbm | 34.02 MB | 0.179 | 0.534 | 288.02 MB | 2.092 | 8.693
svtm | 5.78 MB | 0.043 | 0.239 | 54.60 MB | 0.342 | 7.759
tbm | 96.01 MB | 0.274 | 0.521 | 768.01 MB | 2.685 | 6.360

* Test 3
select prepare(
1000000, -- max block
2, -- # of dead tuples per page
100, -- dead tuples interval within a page
1, -- # of consecutive pages having dead tuples
1 -- page interval
);

name | attach | attach | shuffled | size_x10 | attach_x10| shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 11.45 MB | 0.009 | 57.698 | 114.45 MB | 0.076 | 1045.639
intset | 15.63 MB | 0.031 | 46.083 | 156.23 MB | 0.243 | 848.525
radix | 40.26 MB | 0.063 | 13.755 | 336.64 MB | 0.501 | 223.413
rtbm | 36.02 MB | 0.123 | 11.527 | 320.02 MB | 1.843 | 180.977
svtm | 9.28 MB | 0.053 | 9.631 | 92.59 MB | 0.438 | 212.626
tbm | 96.01 MB | 0.228 | 10.381 | 768.01 MB | 2.258 | 126.630

* Test 4
select prepare(
1000000, -- max block
100, -- # of dead tuples per page
1, -- dead tuples interval within a page
1, -- # of consecutive pages having dead tuples
1 -- page interval
);

name | attach | attach | shuffled | size_x10 | attach_x10| shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 572.21 MB | 0.367 | 78.047 | 5722.05 MB | 3.942 | 1154.776
intset | 93.74 MB | 0.777 | 45.146 | 937.34 MB | 7.716 | 643.708
radix | 40.26 MB | 0.203 | 9.015 | 336.64 MB | 1.775 | 133.294
rtbm | 36.02 MB | 0.369 | 5.639 | 320.02 MB | 3.823 | 88.832
svtm | 7.28 MB | 0.294 | 3.891 | 73.60 MB | 2.690 | 103.744
tbm | 96.01 MB | 0.534 | 5.223 | 768.01 MB | 5.679 | 60.632

* Test 5
select prepare(
1000000, -- max block
150, -- # of dead tuples per page
1, -- dead tuples interval within a page
10000, -- # of consecutive pages having dead tuples
20000 -- page interval
);

There are 10000 consecutive pages that have 150 dead tuples at every
20000 pages.

name | attach | attach | shuffled | size_x10 | attach_x10| shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 429.16 MB | 0.274 | 75.664 | 4291.54 MB | 3.067 | 1259.501
intset | 46.88 MB | 0.559 | 36.449 | 468.67 MB | 4.565 | 517.445
radix | 20.26 MB | 0.166 | 8.466 | 196.90 MB | 1.273 | 166.587
rtbm | 18.02 MB | 0.242 | 8.491 | 160.02 MB | 2.407 | 171.725
svtm | 3.66 MB | 0.243 | 3.635 | 37.10 MB | 2.022 | 86.165
tbm | 48.01 MB | 0.344 | 9.763 | 384.01 MB | 3.327 | 151.824

* Test 6
select prepare(
1000000, -- max block
10, -- # of dead tuples per page
1, -- dead tuples interval within a page
10000, -- # of consecutive pages having dead tuples
20000 -- page interval
);

There are 10000 consecutive pages that have 10 dead tuples at every 20000 pages.

name | attach | attach | shuffled | size_x10 | attach_x10| shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 28.62 MB | 0.022 | 2.791 | 286.11 MB | 0.170 | 46.920
intset | 23.45 MB | 0.061 | 2.156 | 234.34 MB | 0.501 | 32.577
radix | 5.04 MB | 0.026 | 0.433 | 48.57 MB | 0.191 | 11.060
rtbm | 17.02 MB | 0.074 | 0.533 | 144.02 MB | 0.954 | 11.502
svtm | 3.16 MB | 0.023 | 0.206 | 27.60 MB | 0.175 | 4.886
tbm | 48.01 MB | 0.132 | 0.656 | 384.01 MB | 1.284 | 10.231

* Test 7
select prepare(
1000000, -- max block
150, -- # of dead tuples per page
1, -- dead tuples interval within a page
1000, -- # of consecutive pages having dead tuples
999000 -- page interval
);

There are pages that have 150 dead tuples at first 1000 blocks and
last 1000 blocks.

name | attach | attach | shuffled | size_x10 | attach_x10| shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 1.72 MB | 0.002 | 7.507 | 17.17 MB | 0.011 | 76.510
intset | 0.20 MB | 0.003 | 6.742 | 1.89 MB | 0.022 | 52.122
radix | 0.20 MB | 0.001 | 1.023 | 1.07 MB | 0.007 | 12.023
rtbm | 0.15 MB | 0.001 | 2.637 | 0.65 MB | 0.009 | 34.528
svtm | 0.52 MB | 0.002 | 0.721 | 0.61 MB | 0.010 | 6.434
tbm | 0.20 MB | 0.002 | 2.733 | 1.51 MB | 0.015 | 38.538

* Test 8
select prepare(
1000000, -- max block
100, -- # of dead tuples per page
1, -- dead tuples interval within a page
50, -- # of consecutive pages having dead tuples
100 -- page interval
);

There are 50 consecutive pages that have 100 dead tuples at every 100 pages.

name | attach | attach | shuffled | size_x10 | attach_x10| shuffled_x10
--------+-----------+--------+----------+------------+-----------+-------------
array | 286.11 MB | 0.184 | 67.233 | 2861.03 MB | 1.743 | 979.070
intset | 46.88 MB | 0.389 | 35.176 | 468.67 MB | 3.698 | 505.322
radix | 21.82 MB | 0.116 | 6.160 | 186.86 MB | 0.891 | 117.730
rtbm | 18.02 MB | 0.182 | 5.909 | 160.02 MB | 1.870 | 112.550
svtm | 4.28 MB | 0.152 | 3.213 | 37.60 MB | 1.383 | 79.073
tbm | 48.01 MB | 0.265 | 6.673 | 384.01 MB | 2.586 | 101.327

Overall, 'svtm' is faster and consumes less memory. 'radix' tree also
has good performance and memory usage.

From these results, svtm is the best data structure among proposed
ideas for dead tuple storage used during lazy vacuum in terms of
performance and memory usage. I think it can support iteration by
extracting the offset of dead tuples for each block while iterating
chunks.

Apart from performance and memory usage points of view, we also need
to consider the reusability of the code. When I started this thread, I
thought the best data structure would be the one optimized for
vacuum's dead tuple storage. However, if we can use a data structure
that can also be used in general, we can use it also for other
purposes. Moreover, if it's too optimized for the current TID system
(32 bits block number, 16 bits offset number, maximum block/offset
number, etc.) it may become a blocker for future changes.

In that sense, radix tree also seems good since it can also be used in
gist vacuum as a replacement for intset, or a replacement for hash
table for shared buffer as discussed before. Are there any other use
cases? On the other hand, I’m concerned that radix tree would be an
over-engineering in terms of vacuum's dead tuples storage since the
dead tuple storage is static data and requires only lookup operation,
so if we want to use radix tree as dead tuple storage, I'd like to see
further use cases.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-07-27 04:26:33 Re: row filtering for logical replication
Previous Message Bossart, Nathan 2021-07-27 03:39:38 Re: log_checkpoint's "WAL file(s) added" is misleading to the point of uselessness