Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2022-11-03 04:59:33
Message-ID: CAFBsxsGV9ssgP-itGa_TVux0v=_B16-28ff+q_G2WLCgBstSLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 31, 2022 at 12:47 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
>
> I've attached v8 patches. 0001, 0002, and 0003 patches incorporated
> the comments I got so far. 0004 patch is a DSA support patch for PoC.

Thanks for the new patchset. This is not a full review, but I have some
comments:

0001 and 0002 look okay on a quick scan -- I will use this as a base for
further work that we discussed. However, before I do so I'd like to request
another revision regarding the following:

> In 0004 patch, the basic idea is to use rt_node_ptr in all inner nodes
> to point its children, and we use rt_node_ptr as either rt_node* or
> dsa_pointer depending on whether the radix tree is shared or not (ie,
> by checking radix_tree->dsa == NULL).

0004: Looks like a good start, but this patch has a large number of changes
like these, making it hard to read:

- if (found && child_p)
- *child_p = child;
+ if (found && childp_p)
+ *childp_p = childp;
...
rt_node_inner_32 *new32;
+ rt_node_ptr new32p;

/* grow node from 4 to 32 */
- new32 = (rt_node_inner_32 *) rt_copy_node(tree, (rt_node *) n4,
- RT_NODE_KIND_32);
+ new32p = rt_copy_node(tree, (rt_node *) n4, RT_NODE_KIND_32);
+ new32 = (rt_node_inner_32 *) node_ptr_get_local(tree, new32p);

It's difficult to keep in my head what all the variables refer to. I
thought a bit about how to split this patch up to make this easier to read.
Here's what I came up with:

typedef struct rt_node_ptr
{
uintptr_t encoded;
rt_node * decoded;
}

Note that there is nothing about "dsa or local". That's deliberate. That
way, we can use the "encoded" field for a tagged pointer as well, as I hope
we can do (at least for local pointers) in the future. So an intermediate
patch would have "static inline void" functions node_ptr_encode() and
node_ptr_decode(), which would only copy from one member to another. I
suspect that: 1. The actual DSA changes will be *much* smaller and easier
to reason about. 2. Experimenting with tagged pointers will be easier.

Also, quick question: 0004 has a new function rt_node_update_inner() -- is
that necessary because of DSA?, or does this ideally belong in 0002? What's
the reason for it?

Regarding the performance, I've
> added another boolean argument to bench_seq/shuffle_search(),
> specifying whether to use the shared radix tree or not. Here are
> benchmark results in my environment,

> [...]

> In non-shared radix tree cases (the forth argument is false), I don't
> see a visible performance degradation. On the other hand, in shared
> radix tree cases (the forth argument is true), I see visible overheads
> because of dsa_get_address().

Thanks, this is useful.

> Please note that the current shared radix tree implementation doesn't
> support any locking, so it cannot be read while written by someone.

I think at the very least we need a global lock to enforce this.

> Also, only one process can iterate over the shared radix tree. When it
> comes to parallel vacuum, these don't become restriction as the leader
> process writes the radix tree while scanning heap and the radix tree
> is read by multiple processes while vacuuming indexes. And only the
> leader process can do heap vacuum by iterating the key-value pairs in
> the radix tree. If we want to use it for other cases too, we would
> need to support locking, RCU or something.

A useful exercise here is to think about what we'd need to do parallel heap
pruning. We don't need to go that far for v16 of course, but what's the
simplest thing we can do to make that possible? Other use cases can change
to more sophisticated schemes if need be.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2022-11-03 05:40:19 Re: Incorrect include file order in guc-file.l
Previous Message Amit Kapila 2022-11-03 04:11:46 Re: [BUG] Logical replica crash if there was an error in a function.