Quick Links

Re: slab allocator performance issues

From:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject:	Re: slab allocator performance issues
Date:	2021-09-10 21:06:54
Message-ID:	a5ccda91-d9fc-49c5-b3c7-c81528b938c5@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

I've been investigating the regressions in some of the benchmark
results, together with the generation context benchmarks [1].

Turns out it's pretty difficult to benchmark this, because the results
strongly depend on what the backend did before. For example if I run
slab_bench_fifo with the "decreasing" test for 32kB blocks and 512B
chunks, I get this:

select * from slab_bench_fifo(1000000, 32768, 512, 100, 10000, 5000);

mem_allocated | alloc_ms | free_ms
---------------+----------+---------
528547840 | 155394 | 87440

i.e. palloc() takes ~155ms and pfree() ~87ms (and these result are
stable, the numbers don't change much with more runs).

But if I run a set of "lifo" tests in the backend first, the results
look like this:

mem_allocated | alloc_ms | free_ms
---------------+----------+---------
528547840 | 41728 | 71524
(1 row)

so the pallocs are suddenly about ~4x faster. Clearly, what the backend
did before may have pretty dramatic impact on results, even for simple
benchmarks like this.

Note: The benchmark was a single SQL script, running all the different
workloads in the same backend.

I did a fair amount of perf profiling, and the main difference between
the slow and fast runs seems to be this:

0 page-faults:u

0 minor-faults:u

0 major-faults:u

20,634,153 page-faults:u

20,634,153 minor-faults:u

0 major-faults:u

Attached is a more complete perf stat output, but the page faults seem
to be the main issue. My theory is that in the "fast" case, the past
backend activity puts the glibc memory management into a state that
prevents page faults in the benchmark.

But of course, this theory may be incomplete - for example it's not
clear why running the benchmark repeatedly would not "condition" the
backend the same way. But it doesn't - it's ~150ms even for repeated runs.

Secondly, I'm not sure this explains why some of the timings actually
got much slower with the 0003 patch, when the sequence of the steps is
still the same. Of course, it's possible 0003 changes the allocation
pattern a bit, interfering with glibc memory management.

This leads to a couple of interesting questions, I think:

1) I've only tested this on Linux, with glibc. I wonder how it'd behave
on other platforms, or with other allocators.

2) Which cases are more important? When the backend was warmed up, or
when each benchmark runs in a new backend? It seems the "new backend" is
something like a "worst case" leading to more page faults, so maybe
that's the thing to watch. OTOH it's unlikely to have a completely new
backend, so maybe not.

3) Can this teach us something about how to allocate stuff, to better
"prepare" the backend for future allocations? For example, it's a bit
strange that repeated runs of the same benchmark don't do the trick, for
some reason.

regards

[1]
https://www.postgresql.org/message-id/bcdd4e3e-c12d-cd2b-7ead-a91ad416100a%40enterprisedb.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
perf.stat.txt	text/plain	2.0 KB

In response to

Re: slab allocator performance issues at 2021-08-03 13:33:28 from Tomas Vondra

Responses

Re: slab allocator performance issues at 2022-10-12 09:37:17 from David Rowley
Re: slab allocator performance issues at 2022-12-05 15:31:36 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Zhang	2021-09-10 21:23:48	Re: ORDER BY pushdowns seem broken in postgres_fdw
Previous Message	McCoy, Shawn	2021-09-10 20:58:20	Remove_temp_files_after_crash and significant recovery/startup time