Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2022-10-07 08:08:35
Message-ID: CAD21AoDPf_XHG8KuNM3DAvixZrKmFWW=6=eQF5EzkC6r6BKE+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 7, 2022 at 2:29 PM John Naylor <john(dot)naylor(at)enterprisedb(dot)com> wrote:
>
> On Fri, Sep 16, 2022 at 1:01 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > In addition to two patches, I've attached the third patch. It's not
> > part of radix tree implementation but introduces a contrib module
> > bench_radix_tree, a tool for radix tree performance benchmarking. It
> > measures loading and lookup performance of both the radix tree and a
> > flat array.
>
> Hi Masahiko, I've been using these benchmarks, along with my own variations, to try various things that I've mentioned. I'm long overdue for an update, but the picture is not yet complete.

Thanks!

> For now, I have two questions that I can't figure out on my own:
>
> 1. There seems to be some non-obvious limit on the number of keys that are loaded (or at least what the numbers report). This is independent of the number of tids per block. Example below:
>
> john=# select * from bench_shuffle_search(0, 8*1000*1000);
> NOTICE: num_keys = 8000000, height = 3, n4 = 0, n16 = 1, n32 = 0, n128 = 250000, n256 = 981
> nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms | array_load_ms | rt_search_ms | array_serach_ms
> ---------+------------------+---------------------+------------+---------------+--------------+-----------------
> 8000000 | 268435456 | 48000000 | 661 | 29 | 276 | 389
>
> john=# select * from bench_shuffle_search(0, 9*1000*1000);
> NOTICE: num_keys = 8388608, height = 3, n4 = 0, n16 = 1, n32 = 0, n128 = 262144, n256 = 1028
> nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms | array_load_ms | rt_search_ms | array_serach_ms
> ---------+------------------+---------------------+------------+---------------+--------------+-----------------
> 8388608 | 276824064 | 54000000 | 718 | 33 | 311 | 446
>
> The array is the right size, but nkeys hasn't kept pace. Can you reproduce this? Attached is the patch I'm using to show the stats when running the test. (Side note: The numbers look unfavorable for radix tree because I'm using 1 tid per block here.)

Yes, I can reproduce this. In tid_to_key_off() we need to cast to
uint64 when packing offset number and block number:

tid_i = ItemPointerGetOffsetNumber(tid);
tid_i |= ItemPointerGetBlockNumber(tid) << shift;

>
> 2. I found that bench_shuffle_search() is much *faster* for traditional binary search on an array than bench_seq_search(). I've found this to be true in every case. This seems counterintuitive to me -- any idea why this is? Example:
>
> john=# select * from bench_seq_search(0, 1000000);
> NOTICE: num_keys = 1000000, height = 2, n4 = 0, n16 = 0, n32 = 31251, n128 = 1, n256 = 122
> nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms | array_load_ms | rt_search_ms | array_serach_ms
> ---------+------------------+---------------------+------------+---------------+--------------+-----------------
> 1000000 | 10199040 | 180000000 | 168 | 106 | 827 | 3348
>
> john=# select * from bench_shuffle_search(0, 1000000);
> NOTICE: num_keys = 1000000, height = 2, n4 = 0, n16 = 0, n32 = 31251, n128 = 1, n256 = 122
> nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms | array_load_ms | rt_search_ms | array_serach_ms
> ---------+------------------+---------------------+------------+---------------+--------------+-----------------
> 1000000 | 10199040 | 180000000 | 171 | 107 | 827 | 1400
>

Ugh, in shuffle_itemptrs(), we shuffled itemptrs instead of itemptr:

for (int i = 0; i < nitems - 1; i++)
{
int j = shuffle_randrange(&state, i, nitems - 1);
ItemPointerData t = itemptrs[j];

itemptrs[j] = itemptrs[i];
itemptrs[i] = t;

With the fix, the results on my environment were:

postgres(1:4093192)=# select * from bench_seq_search(0, 10000000);
2022-10-07 16:57:03.124 JST [4093192] LOG: num_keys = 10000000,
height = 3, n4 = 0, n16 = 1, n32 = 312500, n128 = 0, n256 = 1226
nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms |
array_load_ms | rt_search_ms | array_serach_ms
----------+------------------+---------------------+------------+---------------+--------------+-----------------
10000000 | 101826560 | 1800000000 | 846 |
486 | 6096 | 21128
(1 row)

Time: 28975.566 ms (00:28.976)
postgres(1:4093192)=# select * from bench_shuffle_search(0, 10000000);
2022-10-07 16:57:37.476 JST [4093192] LOG: num_keys = 10000000,
height = 3, n4 = 0, n16 = 1, n32 = 312500, n128 = 0, n256 = 1226
nkeys | rt_mem_allocated | array_mem_allocated | rt_load_ms |
array_load_ms | rt_search_ms | array_serach_ms
----------+------------------+---------------------+------------+---------------+--------------+-----------------
10000000 | 101826560 | 1800000000 | 845 |
484 | 32700 | 152583
(1 row)

I've attached a patch to fix them. Also, I realized that bsearch()
could be optimized out so I added code to prevent it:

Regards,

--
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
fix_bench_radix_tree.patch application/x-patch 3.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2022-10-07 08:16:40 Re: Make ON_ERROR_STOP stop on shell script failure
Previous Message Fujii Masao 2022-10-07 07:59:48 Re: ps command does not show walsender's connected db