Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2023-05-23 23:16:54
Message-ID: CAFBsxsFyWLxweHVDtKb7otOCR4XdQGYR4b+9svxpVFnJs08BmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> the current insert/delete paths are quite complex. Using bitmap heap scan
as a motivating use case, I hope to refocus complexity to where it's most
needed, and aggressively simplify where possible.

Sometime in the not-too-distant future, I will start a new thread focusing
on bitmap heap scan, but for now, I just want to share some progress on
making the radix tree usable not only for that, but hopefully a wider range
of applications, while making the code simpler and the binary smaller. The
attached patches are incomplete (e.g. no iteration) and quite a bit messy,
so tar'd and gzip'd for the curious (should apply on top of v32 0001-03 +
0007-09 ).

0001

This combines a few concepts that I didn't bother separating out after the
fact:
- Split insert_impl.h into multiple functions for improved readability and
maintainability.
- Use single-value leaves as the basis for storing values, with the goal to
get to "combined pointer-value slots" for efficiency and flexibility.
- With the latter in mind, searching the child within a node now returns
the address of the slot. This allows the same interface whether the slot
contains a child pointer or a value.
- Starting with RT_SET, start turning some iterative algorithms into
recursive ones. This is a more natural way to traverse a tree structure,
and we already see an advantage: Previously when growing a node, we
searched within the parent to update its reference to the new node, because
we didn't know the slot we descended from. Now we can simply update a
single variable.
- Since we recursively pass the "shift" down the stack, it doesn't have to
be stored in any node -- only the "top-level" start shift is stored in the
tree control struct. This was easy to code since the node's shift value was
hardly ever accessed anyway! The node header shrinks from 5 bytes to 4.

0002

Back in v15, we tried keeping DSA/local pointers as members of a struct. I
did not like the result, but still thought it was a good idea. RT_DELETE is
a complex function and I didn't want to try rewriting it without a pointer
abstraction, so I've resurrected this idea, but in a simpler, less
intrusive way. A key difference from v15 is using a union type for the
non-shmem case.

0004

Rewrite RT_DELETE using recursion. I find this simpler than the previous
open-coded stack.

0005-06

Deletion has an inefficiency: One function searches for the child to see if
it's there, then another function searches for it again to delete it. Since
0001, a successful child search returns the address of the slot, so we can
save it. For the two smaller "linear search" node kinds we can then use a
single subtraction to compute the chunk/slot index for deletion. Also,
split RT_NODE_DELETE_INNER into separate functions, for a similar reason as
the insert case in 0001.

0007

Anticipate node shrinking: If only one node-kind needs to be freed, we can
move a branch to that one code path, rather than every place where RT_FREE
is inlined.

0009

Teach node256 how to shrink *. Since we know the number of children in a
node256 can't possibly be zero, we can use uint8 to store the count and
interpret an overflow to zero as 256 for this node. The node header shrinks
from 4 bytes to 3.

* Other nodes will follow in due time, but only after I figure out how to
do it nicely (ideas welcome!) -- currently node32's two size classes work
fine for growing, but the code should be simplified before extending to
other cases.)

0010

Limited support for "combined pointer-value slots". At compile-time, choose
either that or "single-value leaves" based on the size of the value type
template parameter. Values that are pointer-sized or less can fit in the
last-level child slots of nominal "inner nodes" without duplicated
leaf-node code. Node256 now must act like the previous 'node256 leaf',
since zero is a valid value. Aside from that, this was a small change.

What I've shared here could work (in principal, since it uses uint64
values) for tidstore, possibly faster (untested) because of better code
density, but as mentioned I want to shoot for higher. For tidbitmap.c, I
want to extend this idea and branch at run-time on a per-value basis, so
that a page-table entry that fits in a pointer can go there, and if not,
it'll be a full leaf. (This technique enables more flexibility in
lossifying pages as well.) Run-time info will require e.g. an additional
bit per slot. Since the node header is now 3 bytes, we can spare one more
byte in the node3 case. In addition, we can and should also bump it back up
to node4, still keeping the metadata within 8 bytes (no struct padding).

I've started in this patchset to refer to the node kinds as "4/16/48/256",
regardless of their actual fanout. This is for readability (by matching the
language in the paper) and maintainability (should *not* ever change
again). The size classes (including multiple classes per kind) could be
determined by macros and #ifdef's. For example, in non-SIMD architectures,
it's likely slow to search an array of 32 key chunks, so in that case the
compiler should choose size classes similar to these four nominal kinds.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v33-ART.tar.gz application/gzip 26.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-05-24 00:17:16 Re: RFI: Extending the TOAST Pointer
Previous Message Jonathan S. Katz 2023-05-23 22:38:37 Re: PG 16 draft release notes ready