Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2022-05-10 01:51:46
Message-ID: CAD21AoDFaBQrxCQpLetqeoiJCQn7N1G+Sf+QZQy0sn6Wa8mSsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Sun, Feb 13, 2022 at 12:39 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> On 2022-02-13 12:36:13 +0900, Masahiko Sawada wrote:
> > Actually, I'm working on simplifying and improving radix tree
> > implementation for PG16 dev cycle. From the discussion so far I think
> > it's better to have a data structure that can be used for
> > general-purpose and is also good for storing TID, not very specific to
> > store TID. So I think radix tree would be a potent candidate. I have
> > done the insertion and search implementation.
>
> Awesome!

To move this project forward, I've implemented radix tree
implementation from scratch while studying Andres's implementation. It
supports insertion, search, and iteration but not deletion yet. In my
implementation, I use Datum as the value so internal and lead nodes
have the same data structure, simplifying the implementation. The
iteration on the radix tree returns keys with the value in ascending
order of the key. The patch has regression tests for radix tree but is
still in PoC state: left many debugging codes, not supported SSE2 SIMD
instructions, added -mavx2 flag is hard-coded.

I've measured the size and loading and lookup performance for each
candidate data structure with two test cases: dense and sparse, by
using the test tool[1]. Here are the results:

* Case1 - Dense (simulating the case where there are 1000 consecutive
pages each of which has 100 dead tuples, at 100 page intervals.)
select prepare(
1000000, -- max block
100, -- # of dead tuples per page
1, -- dead tuples interval within a page
1000, -- # of consecutive pages having dead tuples
1100 -- page interval
);

name size attach lookup
array 520 MB 248.60 ms 89891.92 ms
hash 3188 MB 28029.59 ms 50850.32 ms
intset 85 MB 644.96 ms 39801.17 ms
tbm 96 MB 474.06 ms 6641.38 ms
radix 37 MB 173.03 ms 9145.97 ms
radix_tree 36 MB 184.51 ms 9729.94 ms

* Case2 - Sparse (simulating a case where there are pages that have 2
dead tuples every 1000 pages.)
select prepare(
10000000, -- max block
2, -- # of dead tuples per page
50, -- dead tuples interval within a page
1, -- # of consecutive pages having dead tuples
1000 -- page interval
);

name size attach lookup
array 125 kB 0.53 ms 82183.61 ms
hash 1032 kB 1.31 ms 28128.33 ms
intset 222 kB 0.51 ms 87775.68 ms
tbm 768 MB 1.24 ms 98674.60 ms
radix 1080 kB 1.66 ms 20698.07 ms
radix_tree 949 kB 1.50 ms 21465.23 ms

Each test virtually generates TIDs and loads them to the data
structure, and then searches for virtual index TIDs.
'array' is a sorted array which is the current method, 'hash' is HTAB,
'intset' is IntegerSet, and 'tbm' is TIDBitmap. The last two results
are radix tree implementations: 'radix' is Andres's radix tree
implementation and 'radix_tree' is my radix tree implementation. In
both radix tree tests, I converted TIDs into an int64 and store the
lower 6 bits in the value part of the radix tree.

Overall, radix tree implementations have good numbers. Once we got an
agreement on moving in this direction, I'll start a new thread for
that and move the implementation further; there are many things to do
and discuss: deletion, API design, SIMD support, more tests etc.

Regards,

[1] https://github.com/MasahikoSawada/pgtools/tree/master/bdbench
[2] https://www.postgresql.org/message-id/CAFiTN-visUO9VTz2%2Bh224z5QeUjKhKNdSfjaCucPhYJdbzxx0g%40mail.gmail.com

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
radixtree.patch application/octet-stream 51.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-05-10 01:52:51 Re: waiting for reload in tests
Previous Message Tom Lane 2022-05-10 01:42:20 Re: waiting for reload in tests