From: | Gregory Smith <gregsmithpgsql(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Major pgbench synthetic SELECT workload regression, Ubuntu 23.04+PG15 |
Date: | 2023-06-09 07:27:51 |
Message-ID: | CAHLJuCX0NC7HOZPD-AOXjfQGE8j++sxXkLCcDkWecM_wMJoxzg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Let me start with the happy ending to this thread:
$ pgbench -S -T 10 -c 32 -j 32 -M prepared -P 1 pgbench
pgbench (15.3 (Ubuntu 15.3-1.pgdg23.04+1))
progress: 1.0 s, 1015713.0 tps, lat 0.031 ms stddev 0.007, 0 failed
progress: 2.0 s, 1083780.4 tps, lat 0.029 ms stddev 0.007, 0 failed...
progress: 8.0 s, 1084574.1 tps, lat 0.029 ms stddev 0.001, 0 failed
progress: 9.0 s, 1082665.1 tps, lat 0.029 ms stddev 0.001, 0 failed
tps = 1077739.910163 (without initial connection time)
Which even seems a whole 0.9% faster than 14 on this hardware! The wonders
never cease.
On Thu, Jun 8, 2023 at 9:21 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> You might need to add --no-children to the perf report invocation,
> otherwise
> it'll show you the call graph inverted.
>
My problem was not writing kernel symbols out, I was only getting addresses
for some reason. This worked:
sudo perf record -g --call-graph dwarf -d --phys-data -a sleep 1
perf report --stdio
And once I looked at the stack trace I immediately saw the problem, fixed
the config option, and this report is now closed as PEBKAC on my part.
Somehow I didn't notice the 15 installs on both systems had
log_min_duration_statement=0, and that's why the performance kept dropping
*only* on the fastest runs.
What I've learned today then is that if someone sees osq_lock in simple
perf top out on oddly slow server, it's possible they are overloading a
device writing out log file data, and leaving out the boring parts the call
trace you might see is:
EmitErrorReport
__GI___libc_write
ksys_write
__fdget_pos
mutex_lock
__mutex_lock_slowpath
__mutex_lock.constprop.0
71.20% osq_lock
Everyone was stuck trying to find the end of the log file to write to it,
and that was the entirety of the problem. Hope that call trace and info
helps out some future goofball making the same mistake. I'd wager this
will come up again.
Thanks to everyone who helped out and I'm looking forward to PG16 testing
now that I have this rusty, embarrassing warm-up out of the way.
--
Greg Smith greg(dot)smith(at)crunchydata(dot)com
Director of Open Source Strategy
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Kefeder | 2023-06-09 07:58:04 | Re: GTIN14 support for contrib/isn |
Previous Message | Tom Lane | 2023-06-09 06:13:45 | Re: Error in calculating length of encoded base64 string |