Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2022-12-22 10:24:16
Message-ID: CAFBsxsG9gVMptcbYS+_Z49xz5iHxnKWYLqB73XcaDFVNoWEOfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 21, 2022 at 3:09 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
>
> On Tue, Dec 20, 2022 at 3:09 PM John Naylor
> <john(dot)naylor(at)enterprisedb(dot)com> wrote:

> >
https://www.postgresql.org/message-id/20220704211822.kfxtzpcdmslzm2dy%40awork3.anarazel.de
> >
> > I'm guessing the hash join case can afford to be precise about memory
because it must spill to disk when exceeding workmem. We don't have that
design constraint.
>
> You mean that the memory used by the radix tree should be limited not
> by the amount of memory actually used, but by the amount of memory
> allocated? In other words, it checks by MomoryContextMemAllocated() in
> the local cases and by dsa_get_total_size() in the shared case.

I mean, if this patch set uses 10x less memory than v15 (not always, but
easy to find cases where it does), and if it's also expensive to track
memory use precisely, then we don't have an incentive to track memory
precisely. Even if we did, we don't want to assume that every future caller
of radix tree is willing to incur that cost.

> The idea of using up to half of maintenance_work_mem might be a good
> idea compared to the current flat-array solution. But since it only
> uses half, I'm concerned that there will be users who double their
> maintenace_work_mem. When it is improved, the user needs to restore
> maintenance_work_mem again.

I find it useful to step back and look at the usage patterns:

Autovacuum: Limiting the memory allocated by vacuum is important, since
there are multiple workers and they can run at any time (possibly most of
the time). This case will not use parallel index vacuum, so will use slab,
where the quick estimation of memory taken by the context is not terribly
far off, so we can afford to be more optimistic here.

Manual vacuum: The default configuration assumes we want to finish as soon
as possible (vacuum_cost_delay is zero). Parallel index vacuum can be used.
My experience leads me to believe users are willing to use a lot of memory
to make manual vacuum finish as quickly as possible, and are disappointed
to learn that even if maintenance work mem is 10GB, vacuum can only use 1GB.

So I don't believe anyone will have to double maintenance work mem after
upgrading (even with pessimistic accounting) because we'll be both
- much more efficient with memory on average
- free from the 1GB cap

That said, it's possible 50% is too pessimistic -- a 75% threshold will
bring us very close to powers of two for example:

2*(1+2+4+8+16+32+64+128) + 256 = 766MB (74.8% of 1GB) -> keep going
766 + 256 = 1022MB -> stop

I'm not sure if that calculation could cause going over the limit, or how
common that would be.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-12-22 11:27:32 Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures
Previous Message Dag Lem 2022-12-22 10:08:17 Re: [PATCH] Add function to_oct