Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2022-11-14 08:43:40
Message-ID: CAD21AoCLvVkvf=0Jyu+wV9=GZTcEva+R0q-FzMU9viT9asoevA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 8, 2022 at 11:14 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Sat, Nov 5, 2022 at 6:23 PM John Naylor <john(dot)naylor(at)enterprisedb(dot)com> wrote:
> >
> > On Fri, Nov 4, 2022 at 10:25 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > For parallel heap pruning, multiple workers will insert key-value
> > > pairs to the radix tree concurrently. The simplest solution would be a
> > > single lock to protect writes but the performance will not be good.
> > > Another solution would be that we can divide the tables into multiple
> > > ranges so that keys derived from TIDs are not conflicted with each
> > > other and have parallel workers process one or more ranges. That way,
> > > parallel vacuum workers can build *sub-trees* and the leader process
> > > can merge them. In use cases of lazy vacuum, since the write phase and
> > > read phase are separated the readers don't need to worry about
> > > concurrent updates.
> >
> > It's a good idea to use ranges for a different reason -- readahead. See commit 56788d2156fc3, which aimed to improve readahead for sequential scans. It might work to use that as a model: Each worker prunes a range of 64 pages, keeping the dead tids in a local array. At the end of the range: lock the tid store, enter the tids into the store, unlock, free the local array, and get the next range from the leader. It's possible contention won't be too bad, and I suspect using small local arrays as-we-go would be faster and use less memory than merging multiple sub-trees at the end.
>
> Seems a promising idea. I think it might work well even in the current
> parallel vacuum (ie., single writer). I mean, I think we can have a
> single lwlock for shared cases in the first version. If the overhead
> of acquiring the lwlock per insertion of key-value is not negligible,
> we might want to try this idea.
>
> Apart from that, I'm going to incorporate the comments on 0004 patch
> and try a pointer tagging.

I'd like to share some progress on this work.

0004 patch is a new patch supporting a pointer tagging of the node
kind. Also, it introduces rt_node_ptr we discussed so that internal
functions use it rather than having two arguments for encoded and
decoded pointers. With this intermediate patch, the DSA support patch
became more readable and understandable. Probably we can make it
smaller further if we move the change of separating the control object
from radix_tree to the main patch (0002). The patch still needs to be
polished but I'd like to check if this idea is worthwhile. If we agree
on this direction, this patch will be merged into the main radix tree
implementation patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v9-0003-tool-for-measuring-radix-tree-performance.patch application/octet-stream 15.8 KB
v9-0004-PoC-tag-the-node-kind-to-rt_pointer.patch application/octet-stream 53.9 KB
v9-0005-PoC-DSA-support-for-radix-tree.patch application/octet-stream 37.7 KB
v9-0006-PoC-lazy-vacuum-integration.patch application/octet-stream 35.4 KB
v9-0002-Add-radix-implementation.patch application/octet-stream 82.9 KB
v9-0001-introduce-vector8_min-and-vector8_highbit_mask.patch application/octet-stream 2.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2022-11-14 08:58:10 RE: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Julien Rouhaud 2022-11-14 07:47:27 Re: Allow file inclusion in pg_hba and pg_ident files