Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2022-12-09 01:19:29
Message-ID: CAD21AoBJe0NeKmzuJNqiXzEvt8gnp6WgcrzUO+qmjHGJXD0_Rw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 6, 2022 at 7:32 PM John Naylor <john(dot)naylor(at)enterprisedb(dot)com> wrote:
>
> On Fri, Dec 2, 2022 at 11:42 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > > On Mon, Nov 14, 2022 at 7:59 PM John Naylor <john(dot)naylor(at)enterprisedb(dot)com> wrote:
> > > >
> > > > - Optimize node128 insert.
> > >
> > > I've attached a rough start at this. The basic idea is borrowed from our bitmapset nodes, so we can iterate over and operate on word-sized (32- or 64-bit) types at a time, rather than bytes.
> >
> > Thanks! I think this is a good idea.
> >
> > > To make this easier, I've moved some of the lower-level macros and types from bitmapset.h/.c to pg_bitutils.h. That's probably going to need a separate email thread to resolve the coding style clash this causes, so that can be put off for later.
>
> I started a separate thread [1], and 0002 comes from feedback on that. There is a FIXME about using WORDNUM and BITNUM, at least with that spelling. I'm putting that off to ease rebasing the rest as v13 -- getting some CI testing with 0002 seems like a good idea. There are no other changes yet. Next, I will take a look at templating local vs. shared memory. I might try basing that on the styles of both v12 and v8, and see which one works best with templating.

Thank you so much!

In the meanwhile, I've been working on vacuum integration. There are
two things I'd like to discuss some time:

The first is the minimum of maintenance_work_mem, 1 MB. Since the
initial DSA segment size is 1MB (DSA_INITIAL_SEGMENT_SIZE), parallel
vacuum with radix tree cannot work with the minimum
maintenance_work_mem. It will need to increase it to 4MB or so. Maybe
we can start a new thread for that.

The second is how to limit the size of the radix tree to
maintenance_work_mem. I think that it's tricky to estimate the maximum
number of keys in the radix tree that fit in maintenance_work_mem. The
radix tree size varies depending on the key distribution. The next
idea I considered was how to limit the size when inserting a key. In
order to strictly limit the radix tree size, probably we have to
change the rt_set so that it breaks off and returns false if the radix
tree size is about to exceed the memory limit when we allocate a new
node or grow a node kind/class. Ideally, I'd like to control the size
outside of radix tree (e.g. TIDStore) since it could introduce
overhead to rt_set() but probably we need to add such logic in radix
tree.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-12-09 02:08:15 Re: postgres_fdw uninterruptible during connection establishment / ProcSignalBarrier
Previous Message Paul Ramsey 2022-12-09 00:44:56 Re: [PATCH] random_normal function