Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2022-12-21 08:09:16
Message-ID: CAD21AoBffs5FpV=5WGU3v0jYY8R_AD8qDHgrBwrqak4ZzHwf-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 20, 2022 at 3:09 PM John Naylor
<john(dot)naylor(at)enterprisedb(dot)com> wrote:
>
>
> On Mon, Dec 19, 2022 at 2:14 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Dec 13, 2022 at 1:04 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > > Looking at other code using DSA such as tidbitmap.c and nodeHash.c, it
> > > seems that they look at only memory that are actually dsa_allocate'd.
> > > To be exact, we estimate the number of hash buckets based on work_mem
> > > (and hash_mem_multiplier) and use it as the upper limit. So I've
> > > confirmed that the result of dsa_get_total_size() could exceed the
> > > limit. I'm not sure it's a known and legitimate usage. If we can
> > > follow such usage, we can probably track how much dsa_allocate'd
> > > memory is used in the radix tree.
> >
> > I've experimented with this idea. The newly added 0008 patch changes
> > the radix tree so that it counts the memory usage for both local and
> > shared cases. As shown below, there is an overhead for that:
> >
> > w/o 0008 patch
> > 298453544 | 282
>
> > w/0 0008 patch
> > 293603184 | 297
>
> This adds about as much overhead as the improvement I measured in the v4 slab allocator patch.

Oh, yes, that's bad.

> https://www.postgresql.org/message-id/20220704211822.kfxtzpcdmslzm2dy%40awork3.anarazel.de
>
> I'm guessing the hash join case can afford to be precise about memory because it must spill to disk when exceeding workmem. We don't have that design constraint.

You mean that the memory used by the radix tree should be limited not
by the amount of memory actually used, but by the amount of memory
allocated? In other words, it checks by MomoryContextMemAllocated() in
the local cases and by dsa_get_total_size() in the shared case.

The idea of using up to half of maintenance_work_mem might be a good
idea compared to the current flat-array solution. But since it only
uses half, I'm concerned that there will be users who double their
maintenace_work_mem. When it is improved, the user needs to restore
maintenance_work_mem again.

A better solution would be to have slab-like DSA. We allocate the
dynamic shared memory by adding fixed-length large segments. However,
downside would be since the segment size gets large we need to
increase maintenance_work_mem as well. Also, this patch set is already
getting bigger and more complicated, I don't think it's a good idea to
add more.

If we limit the memory usage by checking the amount of memory actually
used, we can use SlabStats() for the local cases. Since DSA doesn't
have such functionality for now we would need to add it. Or we can
track it in the radix tree only in the shared cases.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2022-12-21 08:18:54 Re: [PATCH] random_normal function
Previous Message Peter Smith 2022-12-21 08:05:22 Re: Force streaming every change in logical decoding