Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2022-10-06 07:52:26
Message-ID: CAD21AoDD5VXYMyWdrSjtN01HdYq3Ja-yzTP=rD9315-5bdLzpA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 5, 2022 at 6:40 PM John Naylor <john(dot)naylor(at)enterprisedb(dot)com> wrote:
>
>
> On Wed, Oct 5, 2022 at 1:46 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Sep 28, 2022 at 12:49 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Sep 23, 2022 at 12:11 AM John Naylor
> > > <john(dot)naylor(at)enterprisedb(dot)com> wrote:
> > > Yeah, node31 and node256 are bloated. We probably could use slab for
> > > node256 independently. It's worth trying a benchmark to see how it
> > > affects the performance and the tree size.
>
> This wasn't the focus of your current email, but while experimenting with v6 I had another thought about local allocation: If we use the default slab block size of 8192 bytes, then only 3 chunks of size 2088 can fit, right? If so, since aset and DSA also waste at least a few hundred bytes, we could store a useless 256-byte slot array within node256. That way, node128 and node256 share the same start of pointers/values array, so there would be one less branch for getting that address. In v6, rt_node_get_values and rt_node_get_children are not inlined (asde: gcc uses a jump table for 5 kinds but not for 4), but possibly should be, and the smaller the better.

It would be good for performance but I'm a bit concerned that it's
highly optimized to the design of aset and DSA. Since size 2088 will
be currently classed as 2616 in DSA, DSA wastes 528 bytes. However, if
we introduce a new class of 2304 (=2048 + 256) bytes we cannot store a
useless 256-byte and the assumption will be broken.

>
> > Regarding DSA support, IIUC we need to use dsa_pointer in inner nodes
> > to point to its child nodes, instead of C pointers (ig, backend-local
> > address). I'm thinking of a straightforward approach as the first
> > step; inner nodes have a union of rt_node* and dsa_pointer and we
> > choose either one based on whether the radix tree is shared or not. We
> > allocate and free the shared memory for individual nodes by
> > dsa_allocate() and dsa_free(), respectively. Therefore we need to get
> > a C pointer from dsa_pointer by using dsa_get_address() while
> > descending the tree. I'm a bit concerned that calling
> > dsa_get_address() for every descent could be performance overhead but
> > I'm going to measure it anyway.
>
> Are dsa pointers aligned the same as pointers to locally allocated memory? Meaning, is the offset portion always a multiple of 4 (or 8)?

I think so.

> It seems that way from a glance, but I can't say for sure. If the lower 2 bits of a DSA pointer are never set, we can tag them the same way as a regular pointer. That same technique could help hide the latency of converting the pointer, by the same way it would hide the latency of loading parts of a node into CPU registers.
>
> One concern is, handling both local and dsa cases in the same code requires more (predictable) branches and reduces code density. That might be a reason in favor of templating to handle each case in its own translation unit.

Right. We also need to support locking for shared radix tree, which
would require more branches.

Regards,

--
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-10-06 08:03:33 Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
Previous Message Alvaro Herrera 2022-10-06 07:32:36 Re: shadow variables - pg15 edition