Re: slab allocator performance issues

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: slab allocator performance issues
Date: 2022-12-14 10:37:52
Message-ID: CAFBsxsEby=vzxX31Rc5-XjkgXFs2UygY7OAHr-Az600NcgSR9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 13, 2022 at 7:50 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
>
> Thanks for testing the patch.
>
> On Mon, 12 Dec 2022 at 20:14, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
wrote:

> > While allocation is markedly improved, freeing looks worse here. The
proportion is surprising because only about 2% of nodes are freed during
the load, but doing that takes up 10-40% of the time compared to allocating.
>
> I've tried to reproduce this with the v13 patches applied and I'm not
> really getting the same as you are. To run the function 100 times I
> used:
>
> select x, a.* from generate_series(1,100) x(x), lateral (select * from
> bench_load_random_int(500 * 1000 * (1+x-x))) a;

Simply running over a longer period of time like this makes the SlabFree
difference much closer to your results, so it doesn't seem out of line
anymore. Here SlabAlloc seems to take maybe 2/3 of the time of current
slab, with a 5% reduction in total time:

500k ints:

v13-0001-0005
average of 30: 217ms

47.61% postgres postgres [.] rt_set
20.99% postgres postgres [.] SlabAlloc
10.00% postgres postgres [.] rt_node_insert_inner.isra.0
6.87% postgres [unknown] [k] 0xffffffffbce011b7
3.53% postgres postgres [.] MemoryContextAlloc
2.82% postgres postgres [.] SlabFree

+slab v4
average of 30: 206ms

51.13% postgres postgres [.] rt_set
14.08% postgres postgres [.] SlabAlloc
11.41% postgres postgres [.] rt_node_insert_inner.isra.0
7.44% postgres [unknown] [k] 0xffffffffbce011b7
3.89% postgres postgres [.] MemoryContextAlloc
3.39% postgres postgres [.] SlabFree

It doesn't look mysterious anymore, but I went ahead and took some more
perf measurements, including for cache misses. My naive impression is that
we're spending a bit more time waiting for data, but having to do less work
with it once we get it, which is consistent with your earlier comments:

perf stat -p $pid sleep 2
v13:
2,001.55 msec task-clock:u # 1.000 CPUs
utilized
0 context-switches:u # 0.000 /sec

0 cpu-migrations:u # 0.000 /sec

311,690 page-faults:u # 155.724 K/sec

3,128,740,701 cycles:u # 1.563 GHz

4,739,333,861 instructions:u # 1.51 insn
per cycle
820,014,588 branches:u # 409.690 M/sec

7,385,923 branch-misses:u # 0.90% of all
branches

+slab v4:
2,001.09 msec task-clock:u # 1.000 CPUs
utilized
0 context-switches:u # 0.000 /sec

0 cpu-migrations:u # 0.000 /sec

326,017 page-faults:u # 162.920 K/sec

3,016,668,818 cycles:u # 1.508 GHz

4,324,863,908 instructions:u # 1.43 insn
per cycle
761,839,927 branches:u # 380.712 M/sec

7,718,366 branch-misses:u # 1.01% of all
branches

perf stat -e LLC-loads,LLC-loads-misses -p $pid sleep 2
min/max of 3 runs:
v13: LL cache misses: 25.08% - 25.41%
+slab v4: LL cache misses: 25.74% - 26.01%

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2022-12-14 10:46:17 RE: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message vignesh C 2022-12-14 10:34:44 Re: Support logical replication of DDLs