Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2022-11-21 06:43:09
Message-ID: CAFBsxsFkKXwssodUGcjYTL7+Da4H6RRDyiWyajz9VgfEqz=VaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 18, 2022 at 8:20 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
>
> On Thu, Nov 17, 2022 at 12:24 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
> >
> > On Wed, Nov 16, 2022 at 4:39 PM John Naylor
> > <john(dot)naylor(at)enterprisedb(dot)com> wrote:

> > > That means my idea for the pointer struct might have some problems,
at least as currently implemented. Maybe in the course of separating out
and polishing that piece, an inefficiency will fall out. Or, it might be
another reason to template local and shared separately. Not sure yet. I
also haven't tried to adjust this test for the shared memory case.

Digging a bit deeper, I see a flaw in my benchmark: Even though the total
distribution of node kinds is decently even, the pattern that the benchmark
sees is not terribly random:

3,343,352 branch-misses:u # 0.85% of all
branches
393,204,959 branches:u

Recall a previous benchmark [1] where the leaf node was about half node16
and half node32. Randomizing the leaf node between the two caused branch
misses to go from 1% to 2%, causing a noticeable slowdown. Maybe in this
new benchmark, each level has a skewed distribution of nodes, giving a
smart branch predictor something to work with. We will need a way to
efficiently generate keys that lead to a relatively unpredictable
distribution of node kinds, as seen by a searcher. Especially in the leaves
(or just above the leaves), since those are less likely to be cached.

> > I'll also run the test on my environment and do the investigation
tomorrow.
> >
>
> FYI I've not tested the patch you shared today but here are the
> benchmark results I did with the v9 patch in my environment (I used
> the second filter). I splitted 0004 patch into two patches: a patch
> for pure refactoring patch to introduce rt_node_ptr and a patch to do
> pointer tagging.

Would you be able to share the refactoring patch? And a fix for the failing
tests? I'm thinking I want to try the templating approach fairly soon.

[1]
https://www.postgresql.org/message-id/CAFBsxsFEVckVzsBsfgGzGR4Yz%3DJp%3DUxOtjYvTjOz6fOoLXtOig%40mail.gmail.com

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-11-21 06:50:57 Re: Reducing power consumption on idle servers
Previous Message Peter Smith 2022-11-21 06:26:03 Re: Perform streaming logical transactions by background workers and parallel apply