Re: Adding basic NUMA awareness

From: Andres Freund <andres(at)anarazel(dot)de>
To: Greg Burd <greg(at)burd(dot)me>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Subject: Re: Adding basic NUMA awareness
Date: 2025-07-09 17:23:06
Message-ID: ndvygkpdx44pmi4xbkf52gfrl77cohpefr42tipvd5dgiaeuyd@fe2og2kxyjnc
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-07-09 12:55:51 -0400, Greg Burd wrote:
> On Jul 9 2025, at 12:35 pm, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > FWIW, I've started to wonder if we shouldn't just get rid of the freelist
> > entirely. While clocksweep is perhaps minutely slower in a single
> > thread than
> > the freelist, clock sweep scales *considerably* better [1]. As it's rather
> > rare to be bottlenecked on clock sweep speed for a single thread
> > (rather then
> > IO or memory copy overhead), I think it's worth favoring clock sweep.
>
> Hey Andres, thanks for spending time on this. I've worked before on
> freelist implementations (last one in LMDB) and I think you're onto
> something. I think it's an innovative idea and that the speed
> difference will either be lost in the noise or potentially entirely
> mitigated by avoiding duplicate work.

Agreed. FWIW, just using clock sweep actually makes things like DROP TABLE
perform better because it doesn't need to maintain the freelist anymore...

> > Also needing to switch between getting buffers from the freelist and
> > the sweep
> > makes the code more expensive. I think just having the buffer in the sweep,
> > with a refcount / usagecount of zero would suffice.
>
> If you're not already coding this, I'll jump in. :)

My experimental patch is literally a four character addition ;), namely adding
"0 &&" to the relevant code in StrategyGetBuffer().

Obviously a real patch would need to do some more work than that. Feel free
to take on that project, I am not planning on tackling that in near term.

There's other things around this that could use some attention. It's not hard
to see clock sweep be a bottleneck in concurrent workloads - partially due to
the shared maintenance of the clock hand. A NUMAed clock sweep would address
that. However, we also maintain StrategyControl->numBufferAllocs, which is a
significant contention point and would not necessarily be removed by a
NUMAificiation of the clock sweep.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-07-09 17:27:50 Re: Improving and extending int128.h to more of numeric.c
Previous Message Laurenz Albe 2025-07-09 17:22:35 Re: analyze-in-stages post upgrade questions