Re: slab allocator performance issues

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: slab allocator performance issues
Date: 2021-09-10 21:06:54
Message-ID: a5ccda91-d9fc-49c5-b3c7-c81528b938c5@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've been investigating the regressions in some of the benchmark
results, together with the generation context benchmarks [1].

Turns out it's pretty difficult to benchmark this, because the results
strongly depend on what the backend did before. For example if I run
slab_bench_fifo with the "decreasing" test for 32kB blocks and 512B
chunks, I get this:

select * from slab_bench_fifo(1000000, 32768, 512, 100, 10000, 5000);

mem_allocated | alloc_ms | free_ms
---------------+----------+---------
528547840 | 155394 | 87440

i.e. palloc() takes ~155ms and pfree() ~87ms (and these result are
stable, the numbers don't change much with more runs).

But if I run a set of "lifo" tests in the backend first, the results
look like this:

mem_allocated | alloc_ms | free_ms
---------------+----------+---------
528547840 | 41728 | 71524
(1 row)

so the pallocs are suddenly about ~4x faster. Clearly, what the backend
did before may have pretty dramatic impact on results, even for simple
benchmarks like this.

Note: The benchmark was a single SQL script, running all the different
workloads in the same backend.

I did a fair amount of perf profiling, and the main difference between
the slow and fast runs seems to be this:

0 page-faults:u

0 minor-faults:u

0 major-faults:u

vs

20,634,153 page-faults:u

20,634,153 minor-faults:u

0 major-faults:u

Attached is a more complete perf stat output, but the page faults seem
to be the main issue. My theory is that in the "fast" case, the past
backend activity puts the glibc memory management into a state that
prevents page faults in the benchmark.

But of course, this theory may be incomplete - for example it's not
clear why running the benchmark repeatedly would not "condition" the
backend the same way. But it doesn't - it's ~150ms even for repeated runs.

Secondly, I'm not sure this explains why some of the timings actually
got much slower with the 0003 patch, when the sequence of the steps is
still the same. Of course, it's possible 0003 changes the allocation
pattern a bit, interfering with glibc memory management.

This leads to a couple of interesting questions, I think:

1) I've only tested this on Linux, with glibc. I wonder how it'd behave
on other platforms, or with other allocators.

2) Which cases are more important? When the backend was warmed up, or
when each benchmark runs in a new backend? It seems the "new backend" is
something like a "worst case" leading to more page faults, so maybe
that's the thing to watch. OTOH it's unlikely to have a completely new
backend, so maybe not.

3) Can this teach us something about how to allocate stuff, to better
"prepare" the backend for future allocations? For example, it's a bit
strange that repeated runs of the same benchmark don't do the trick, for
some reason.

regards

[1]
https://www.postgresql.org/message-id/bcdd4e3e-c12d-cd2b-7ead-a91ad416100a%40enterprisedb.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
perf.stat.txt text/plain 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Zhang 2021-09-10 21:23:48 Re: ORDER BY pushdowns seem broken in postgres_fdw
Previous Message McCoy, Shawn 2021-09-10 20:58:20 Remove_temp_files_after_crash and significant recovery/startup time