Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2022-12-22 14:59:22
Message-ID: CAD21AoAnQnZPJ63-vd_sUJgnc7L818x1oCnaf7vupt7J3JLo+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 22, 2022 at 7:24 PM John Naylor
<john(dot)naylor(at)enterprisedb(dot)com> wrote:
>
>
> On Wed, Dec 21, 2022 at 3:09 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Dec 20, 2022 at 3:09 PM John Naylor
> > <john(dot)naylor(at)enterprisedb(dot)com> wrote:
>
> > > https://www.postgresql.org/message-id/20220704211822.kfxtzpcdmslzm2dy%40awork3.anarazel.de
> > >
> > > I'm guessing the hash join case can afford to be precise about memory because it must spill to disk when exceeding workmem. We don't have that design constraint.
> >
> > You mean that the memory used by the radix tree should be limited not
> > by the amount of memory actually used, but by the amount of memory
> > allocated? In other words, it checks by MomoryContextMemAllocated() in
> > the local cases and by dsa_get_total_size() in the shared case.
>
> I mean, if this patch set uses 10x less memory than v15 (not always, but easy to find cases where it does), and if it's also expensive to track memory use precisely, then we don't have an incentive to track memory precisely. Even if we did, we don't want to assume that every future caller of radix tree is willing to incur that cost.

Understood.

>
> > The idea of using up to half of maintenance_work_mem might be a good
> > idea compared to the current flat-array solution. But since it only
> > uses half, I'm concerned that there will be users who double their
> > maintenace_work_mem. When it is improved, the user needs to restore
> > maintenance_work_mem again.
>
> I find it useful to step back and look at the usage patterns:
>
> Autovacuum: Limiting the memory allocated by vacuum is important, since there are multiple workers and they can run at any time (possibly most of the time). This case will not use parallel index vacuum, so will use slab, where the quick estimation of memory taken by the context is not terribly far off, so we can afford to be more optimistic here.
>
> Manual vacuum: The default configuration assumes we want to finish as soon as possible (vacuum_cost_delay is zero). Parallel index vacuum can be used. My experience leads me to believe users are willing to use a lot of memory to make manual vacuum finish as quickly as possible, and are disappointed to learn that even if maintenance work mem is 10GB, vacuum can only use 1GB.

Agreed.

> So I don't believe anyone will have to double maintenance work mem after upgrading (even with pessimistic accounting) because we'll be both
> - much more efficient with memory on average
> - free from the 1GB cap

Make sense.

>
> That said, it's possible 50% is too pessimistic -- a 75% threshold will bring us very close to powers of two for example:
>
> 2*(1+2+4+8+16+32+64+128) + 256 = 766MB (74.8% of 1GB) -> keep going
> 766 + 256 = 1022MB -> stop
>
> I'm not sure if that calculation could cause going over the limit, or how common that would be.
>

If the value is a power of 2, it seems to work perfectly fine. But for
example if it's 700MB, the total memory exceeds the limit:

2*(1+2+4+8+16+32+64+128) = 510MB (72.8% of 700MB) -> keep going
510 + 256 = 766MB -> stop but it exceeds the limit.

In a more bigger case, if it's 11000MB,

2*(1+2+...+2048) = 8190MB (74.4%)
8190 + 4096 = 12286MB

That being said, I don't think they are not common cases. So the 75%
threshold seems to work fine in most cases.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2022-12-22 15:35:58 ARRNELEMS Out-of-bounds possible errors
Previous Message Ranier Vilela 2022-12-22 14:35:57 Re: Small miscellaneus fixes (Part II)