Quick Links

Re: [PoC] Improve dead tuple storage for lazy vacuum

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc:	Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PoC] Improve dead tuple storage for lazy vacuum
Date:	2023-02-09 07:08:03
Message-ID:	CAD21AoDePt8x14Vrg+xLFd2akRAKHS_8j2uVmyoRVMYOze-VSw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On Tue, Feb 7, 2023 at 6:25 PM John Naylor <john(dot)naylor(at)enterprisedb(dot)com> wrote:
>
>
> On Tue, Jan 31, 2023 at 9:43 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > I've attached v24 patches. The locking support patch is separated
> > (0005 patch). Also I kept the updates for TidStore and the vacuum
> > integration from v23 separate.
>
> Okay, that's a lot more simple, and closer to what I imagined. For v25, I squashed v24's additions and added a couple of my own. I've kept the CF status at "needs review" because no specific action is required at the moment.
>
> I did start to review the TID store some more, but that's on hold because something else came up: On a lark I decided to re-run some benchmarks to see if anything got lost in converting to a template, and that led me down a rabbit hole -- some good and bad news on that below.
>
> 0001:
>
> I removed the uint64 case, as discussed. There is now a brief commit message, but needs to be fleshed out a bit. I took another look at the Arm optimization that Nathan found some month ago, for forming the highbit mask, but that doesn't play nicely with how node32 uses it, so I decided against it. I added a comment to describe the reasoning in case someone else gets a similar idea.
>
> I briefly looked into "separate-commit TODO: move non-SIMD fallbacks to their own header to clean up the #ifdef maze.", but decided it wasn't such a clear win to justify starting the work now. It's still in the back of my mind, but I removed the reminder from the commit message.

The changes make sense to me.

>
> 0003:
>
> The template now requires the value to be passed as a pointer. That was a pretty trivial change, but affected multiple other patches, so not sent separately. Also adds a forgotten RT_ prefix to the bitmap macros and adds a top comment to the *_impl.h headers. There are some comment fixes. The changes were either trivial or discussed earlier, so also not sent separately.

Great.

>
> 0004/5: I wanted to measure the load time as well as search time in bench_search_random_nodes(). That's kept separate to make it easier to test other patch versions.
>
> The bad news is that the speed of loading TIDs in bench_seq/shuffle_search() has regressed noticeably. I can't reproduce this in any other bench function and was the reason for writing 0005 to begin with. More confusingly, my efforts to fix this improved *other* functions, but the former didn't budge at all. First the patches:
>
> 0006 adds and removes some "inline" declarations (where it made sense), and added some for "pg_noinline" based on Andres' advice some months ago.

Agreed.

>
> 0007 removes some dead code. RT_NODE_INSERT_INNER is only called during RT_SET_EXTEND, so it can't possibly find an existing key. This kind of change is much easier with the inner/node cases handled together in a template, as far as being sure of how those cases are different. I thought about trying the search in assert builds and verifying it doesn't exist, but thought yet another #ifdef would be too messy.

Agreed.

>
> v25-addendum-try-no-maintain-order.txt -- It makes optional keeping the key chunks in order for the linear-search nodes. I believe the TID store no longer cares about the ordering, but this is a text file for now because I don't want to clutter the CI with a behavior change. Also, the second ART paper (on concurrency) mentioned that some locking schemes don't allow these arrays to be shifted. So it might make sense to give up entirely on guaranteeing ordered iteration, or at least make it optional as in the patch.

I think it's still important for lazy vacuum that an iteration over a
TID store returns TIDs in ascending order, because otherwise a heap
vacuum does random writes. That being said, we can have
RT_ITERATE_NEXT() return key-value pairs in an order regardless of how
the key chunks are stored in a node.

> ========================================
> psql -c "select rt_load_ms, rt_search_ms from bench_seq_search(0, 1 * 1000 * 1000)"
> (min load time of three)
>
> v15:
> rt_load_ms | rt_search_ms
> ------------+--------------
> 113 | 455
>
> v25-0005:
> rt_load_ms | rt_search_ms
> ------------+--------------
> 135 | 456
>
> v25-0006 (inlining or not):
> rt_load_ms | rt_search_ms
> ------------+--------------
> 136 | 455
>
> v25-0007 (remove dead code):
> rt_load_ms | rt_search_ms
> ------------+--------------
> 135 | 455
>
> v25-addendum...txt (no ordering):
> rt_load_ms | rt_search_ms
> ------------+--------------
> 134 | 455
>
> Note: The regression seems to have started in v17, which is the first with a full template.
>
> Nothing so far has helped here, and previous experience has shown that trying to profile 100ms will not be useful. Instead of putting more effort into diving deeper, it seems a better use of time to write a benchmark that calls the tid store itself. That's more realistic, since this function was intended to test load and search of tids, but the tid store doesn't quite operate so simply anymore. What do you think, Masahiko?

Yeah, that's more realistic. TidStore now encodes TIDs slightly
differently from the benchmark test.

I've attached the patch that adds a simple benchmark test using
TidStore. With this test, I got similar trends of results to yours
with gcc, but I've not analyzed them in depth yet.

query: select * from bench_tidstore_load(0, 10 * 1000 * 1000)

v15:
load_ms
---------
816

v25-0007 (remove dead code):
load_ms
---------
839

v25-addendum...txt (no ordering):
load_ms
---------
820

BTW it would be better to remove the RT_DEBUG macro from bench_radix_tree.c.

>
> I'm inclined to keep 0006, because it might give a slight boost, and 0007 because it's never a bad idea to remove dead code.

Yeah, these two changes make sense to me too.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
0001-Add-bench_tidstore_load.patch.txt	text/plain	3.1 KB

In response to

Re: [PoC] Improve dead tuple storage for lazy vacuum at 2023-02-07 09:25:44 from John Naylor

Responses

Re: [PoC] Improve dead tuple storage for lazy vacuum at 2023-02-09 12:56:27 from John Naylor
Re: [PoC] Improve dead tuple storage for lazy vacuum at 2023-02-10 06:51:24 from John Naylor
Re: [PoC] Improve dead tuple storage for lazy vacuum at 2023-02-11 05:33:42 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jim Jones	2023-02-09 07:16:50	Re: [PATCH] Add pretty-printed XML output option
Previous Message	Andres Freund	2023-02-09 07:03:49	Re: Introduce a new view for checkpointer related stats