Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2022-12-12 10:14:02
Message-ID: CAFBsxsEa91khH5oUv8GCNqzht4qG-nTiGDsqYCs_cUGd6q5wkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 9, 2022 at 8:33 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
>
> On Fri, Dec 9, 2022 at 5:53 PM John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
wrote:
> >

> > I don't think that'd be very controversial, but I'm also not sure why
we'd need 4MB -- can you explain in more detail what exactly we'd need so
that the feature would work? (The minimum doesn't have to work *well* IIUC,
just do some useful work and not fail).
>
> The minimum requirement is 2MB. In PoC patch, TIDStore checks how big
> the radix tree is using dsa_get_total_size(). If the size returned by
> dsa_get_total_size() (+ some memory used by TIDStore meta information)
> exceeds maintenance_work_mem, lazy vacuum starts to do index vacuum
> and heap vacuum. However, when allocating DSA memory for
> radix_tree_control at creation, we allocate 1MB
> (DSA_INITIAL_SEGMENT_SIZE) DSM memory and use memory required for
> radix_tree_control from it. das_get_total_size() returns 1MB even if
> there is no TID collected.

2MB makes sense.

If the metadata is small, it seems counterproductive to count it towards
the total. We want the decision to be driven by blocks allocated. I have an
idea on that below.

> > Remember when we discussed how we might approach parallel pruning? I
envisioned a local array of a few dozen kilobytes to reduce contention on
the tidstore. We could use such an array even for a single worker (always
doing the same thing is simpler anyway). When the array fills up enough so
that the next heap page *could* overflow it: Stop, insert into the store,
and check the store's memory usage before continuing.
>
> Right, I think it's no problem in slab cases. In DSA cases, the new
> segment size follows a geometric series that approximately doubles the
> total storage each time we create a new segment. This behavior comes
> from the fact that the underlying DSM system isn't designed for large
> numbers of segments.

And taking a look, the size of a new segment can get quite large. It seems
we could test if the total DSA area allocated is greater than half of
maintenance_work_mem. If that parameter is a power of two (common) and
>=8MB, then the area will contain just under a power of two the last time
it passes the test. The next segment will bring it to about 3/4 full, like
this:

maintenance work mem = 256MB, so stop if we go over 128MB:

2*(1+2+4+8+16+32) = 126MB -> keep going
126MB + 64 = 190MB -> stop

That would be a simple way to be conservative with the memory limit. The
unfortunate aspect is that the last segment would be mostly wasted, but
it's paradise compared to the pessimistically-sized single array we have
now (even with Peter G.'s VM snapshot informing the allocation size, I
imagine).

And as for minimum possible maintenance work mem, I think this would work
with 2MB, if the community is okay with technically going over the limit by
a few bytes of overhead if a buildfarm animal set to that value. I imagine
it would never go over the limit for realistic (and even most unrealistic)
values. Even with a VM snapshot page in memory and small local arrays of
TIDs, I think with this scheme we'll be well under the limit.

After this feature is complete, I think we should consider a follow-on
patch to get rid of vacuum_work_mem, since it would no longer be needed.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2022-12-12 10:14:34 Re: Checksum errors in pg_stat_database
Previous Message Richard Guo 2022-12-12 09:55:01 Re: A problem about ParamPathInfo for an AppendPath