Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2022-09-30 15:46:54
Message-ID: CAD21AoDg8S1QP-YR4132oPazPSvhdn-uArDF1AGEQiAYKhHfqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 28, 2022 at 3:18 PM John Naylor
<john(dot)naylor(at)enterprisedb(dot)com> wrote:
>
> On Wed, Sep 28, 2022 at 10:49 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > BTW We need to consider not only aset/slab but also DSA since we
> > allocate dead tuple TIDs on DSM in parallel vacuum cases. FYI DSA uses
> > the following size classes:
> >
> > static const uint16 dsa_size_classes[] = {
> > [...]
>
> Thanks for that info -- I wasn't familiar with the details of DSA. For the non-parallel case, I plan to at least benchmark using aset because I gather it's the most heavily optimized. I'm thinking that will allow other problem areas to be more prominent. I'll also want to compare total context size compared to slab to see if possibly less fragmentation makes up for other wastage.

Thanks!

>
> Along those lines, one thing I've been thinking about is the number of size classes. There is a tradeoff between memory efficiency and number of branches when searching/inserting. My current thinking is there is too much coupling between size class and data type. Each size class currently uses a different data type and a different algorithm to search and set it, which in turn requires another branch. We've found that a larger number of size classes leads to poor branch prediction [1] and (I imagine) code density.
>
> I'm thinking we can use "flexible array members" for the values/pointers, and keep the rest of the control data in the struct the same. That way, we never have more than 4 actual "kinds" to code and branch on. As a bonus, when migrating a node to a larger size class of the same kind, we can simply repalloc() to the next size.

Interesting idea. Using flexible array members for values would be
good also for the case in the future where we want to support other
value types than uint64.

With this idea, we can just repalloc() to grow to the larger size in a
pair but I'm slightly concerned that the more size class we use, the
more frequent the node needs to grow. If we want to support node
shrink, the deletion is also affected.

> To show what I mean, consider this new table:
>
> node2: 5 + 6 +(5)+ 2*8 = 32 bytes
> node6: 5 + 6 +(5)+ 6*8 = 64
>
> node12: 5 + 27 + 12*8 = 128
> node27: 5 + 27 + 27*8 = 248(->256)
>
> node91: 5 + 256 + 28 +(7)+ 91*8 = 1024
> node219: 5 + 256 + 28 +(7)+219*8 = 2048
>
> node256: 5 + 32 +(3)+256*8 = 2088(->4096)
>
> Seven size classes are grouped into the four kinds.
>
> The common base at the front is here 5 bytes because there is a new uint8 field for "capacity", which we can ignore for node256 since we assume we can always insert/update that node. The control data is the same in each pair, and so the offset to the pointer/value array is the same. Thus, migration would look something like:

I think we can use a bitfield for capacity. That way, we can pack
count (9bits), kind (2bits)and capacity (4bits) in uint16.

> Somewhat unrelated, we could still implement Andres' idea [1] to dispense with the isset array in inner nodes of the indirect array type (now node128), since we can just test if the pointer is null.

Right. I didn't do that to use the common logic for inner node128 and
leaf node128.

Regards,

--
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2022-09-30 15:58:31 Re: Tracking last scan time
Previous Message Tom Lane 2022-09-30 15:45:35 Re: [PATCH v1] [meson] add a default option prefix=/usr/local/pgsql