Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-01-03 14:10:11
Message-ID: CANWCAZa5tqBNRNGj4Lq+=9Z7tu1Nmq1j1hwngNfz1kvh8Jytaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 2, 2024 at 8:01 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> I agree that we expose RT_LOCK_* functions and have tidstore use them,
> but am not sure the if (TidStoreIsShared(ts) LWLockAcquire(..., ...)"
> calls part. I think that even if we expose them, we will still need to
> do something like "if (TidStoreIsShared(ts))
> shared_rt_lock_share(ts->tree.shared)", no?

I'll come back to this topic separately.

> I've attached a new patch set. From v47 patch, I've merged your
> changes for radix tree, and split the vacuum integration patch into 3
> patches: simply replaces VacDeadItems with TidsTore (0007 patch), and
> use a simple TID array for one-pass strategy (0008 patch), and replace
> has_lpdead_items with "num_offsets > 0" (0009 patch), while
> incorporating your review comments on the vacuum integration patch

Nice!

> (sorry for making it difficult to see the changes from v47 patch).

It's actually pretty clear. I just have a couple comments before
sharing my latest cleanups:

(diff'ing between v47 and v48):

-- /*
- * In the shared case, TidStoreControl and radix_tree are backed by the
- * same DSA area and rt_memory_usage() returns the value including both.
- * So we don't need to add the size of TidStoreControl separately.
- */
if (TidStoreIsShared(ts))
- return sizeof(TidStore) +
shared_rt_memory_usage(ts->tree.shared);
+ rt_mem = shared_rt_memory_usage(ts->tree.shared);
+ else
+ rt_mem = local_rt_memory_usage(ts->tree.local);

- return sizeof(TidStore) + sizeof(TidStore) +
local_rt_memory_usage(ts->tree.local);
+ return sizeof(TidStore) + sizeof(TidStoreControl) + rt_mem;

Upthread, I meant that I don't see the need to include the size of
these structs *at all*. They're tiny, and the blocks/segments will
almost certainly have some empty space counted in the total anyway.
The returned size is already overestimated, so this extra code is just
a distraction.

- if (result->num_offsets + bmw_popcount(w) > result->max_offset)
+ if (result->num_offsets + (sizeof(bitmapword) * BITS_PER_BITMAPWORD)
>= result->max_offset)

I believe this math is wrong. We care about "result->num_offsets +
BITS_PER_BITMAPWORD", right?
Also, it seems if the condition evaluates to equal, we still have
enough space, in which case ">" the max is the right condition.

- if (off < 1 || off > MAX_TUPLES_PER_PAGE)
+ if (off < 1 || off > MaxOffsetNumber)

This can now use OffsetNumberIsValid().

> 0013 to 0015 patches are also updates from v47 patch.

> I'm thinking that we should change the order of the patches so that
> tidstore patch requires the patch for changing DSA segment sizes. That
> way, we can remove the complex max memory calculation part that we no
> longer use from the tidstore patch.

I don't think there is any reason to have those calculations at all at
this point. Every patch in every version should at least *work
correctly*, without kludging m_w_m and without constraining max
segment size. I'm fine with the latter remaining in its own thread,
and I hope we can consider it an enhancement that respects the admin's
configured limits more effectively, and not a pre-requisite for not
breaking. I *think* we're there now, but it's hard to tell since 0015
was at the very end. As I said recently, if something still fails, I'd
like to know why. So for v49, I took the liberty of removing the DSA
max segment patches for now, and squashing v48-0015.

In addition for v49, I have quite a few cleanups:

0001 - This hasn't been touched in a very long time, but I ran
pgindent and clarified a comment
0002 - We no longer need to isolate the rightmost bit anywhere, so
removed that part and revised the commit message accordingly.

radix tree:
0003 - v48 plus squashed v48-0013
0004 - Removed or adjusted WIP, FIXME, TODO items. Some were outdated,
and I fixed most of the rest.
0005 - Remove the RT_PTR_LOCAL macro, since it's not really useful anymore.
0006 - RT_FREE_LEAF only needs the allocated pointer, so pass that. A
bit simpler.
0007 - Uses the same idea from a previous cleanup of RT_SET, for RT_DELETE.
0008 - Removes a holdover from the multi-value leaves era.
0009 - It occurred to me that we need to have unique names for memory
contexts for different instantiations of the template. This is one way
to do it, by using the configured RT_PREFIX in the context name. I
also took an extra step to make the size class fanout show up
correctly on different platforms, but that's probably overkill and
undesirable, and I'll probably use only the class name next time.
0010/11 - Make the array functions less surprising and with more
informative names.
0012 - Restore a useful technique from Andres's prototype. This part
has been slow for a long time, so much that it showed up in a profile
where this path wasn't even taken much.

tid store / vacuum:
0013/14 - Same as v48 TID store, with review squashed
0015 - Rationalize comment and starting value.
0016 - I applied the removal of the old clamps from v48-0011 (init/max
DSA), and left out the rest for now.
0017-20 - Vacuum and debug tidstore as in v48, with v48-0015 squashed

I'll bring up locking again shortly.

Attachment Content-Type Size
v49-ART.tar.gz application/gzip 69.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2024-01-03 14:12:46 Re: Change GUC hashtable to use simplehash?
Previous Message Dave Cramer 2024-01-03 13:59:50 Re: Password leakage avoidance