Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2022-12-09 13:32:50
Message-ID: CAD21AoAAAv7r7S9b0vTXFnASGE8HehKRoiDEST99DmLjyxpDkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 9, 2022 at 5:53 PM John Naylor <john(dot)naylor(at)enterprisedb(dot)com> wrote:
>
>
> On Fri, Dec 9, 2022 at 8:20 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > In the meanwhile, I've been working on vacuum integration. There are
> > two things I'd like to discuss some time:
> >
> > The first is the minimum of maintenance_work_mem, 1 MB. Since the
> > initial DSA segment size is 1MB (DSA_INITIAL_SEGMENT_SIZE), parallel
> > vacuum with radix tree cannot work with the minimum
> > maintenance_work_mem. It will need to increase it to 4MB or so. Maybe
> > we can start a new thread for that.
>
> I don't think that'd be very controversial, but I'm also not sure why we'd need 4MB -- can you explain in more detail what exactly we'd need so that the feature would work? (The minimum doesn't have to work *well* IIUC, just do some useful work and not fail).

The minimum requirement is 2MB. In PoC patch, TIDStore checks how big
the radix tree is using dsa_get_total_size(). If the size returned by
dsa_get_total_size() (+ some memory used by TIDStore meta information)
exceeds maintenance_work_mem, lazy vacuum starts to do index vacuum
and heap vacuum. However, when allocating DSA memory for
radix_tree_control at creation, we allocate 1MB
(DSA_INITIAL_SEGMENT_SIZE) DSM memory and use memory required for
radix_tree_control from it. das_get_total_size() returns 1MB even if
there is no TID collected.

>
> > The second is how to limit the size of the radix tree to
> > maintenance_work_mem. I think that it's tricky to estimate the maximum
> > number of keys in the radix tree that fit in maintenance_work_mem. The
> > radix tree size varies depending on the key distribution. The next
> > idea I considered was how to limit the size when inserting a key. In
> > order to strictly limit the radix tree size, probably we have to
> > change the rt_set so that it breaks off and returns false if the radix
> > tree size is about to exceed the memory limit when we allocate a new
> > node or grow a node kind/class.
>
> That seems complex, fragile, and wrong scope.
>
> > Ideally, I'd like to control the size
> > outside of radix tree (e.g. TIDStore) since it could introduce
> > overhead to rt_set() but probably we need to add such logic in radix
> > tree.
>
> Does the TIDStore have the ability to ask the DSA (or slab context) to see how big it is?

Yes, TIDStore can check it using dsa_get_total_size().

> If a new segment has been allocated that brings us to the limit, we can stop when we discover that fact. In the local case with slab blocks, it won't be on nice neat boundaries, but we could check if we're within the largest block size (~64kB) of overflow.
>
> Remember when we discussed how we might approach parallel pruning? I envisioned a local array of a few dozen kilobytes to reduce contention on the tidstore. We could use such an array even for a single worker (always doing the same thing is simpler anyway). When the array fills up enough so that the next heap page *could* overflow it: Stop, insert into the store, and check the store's memory usage before continuing.

Right, I think it's no problem in slab cases. In DSA cases, the new
segment size follows a geometric series that approximately doubles the
total storage each time we create a new segment. This behavior comes
from the fact that the underlying DSM system isn't designed for large
numbers of segments.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2022-12-09 13:42:31 Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Previous Message Andrew Dunstan 2022-12-09 13:06:58 Re: Error-safe user functions