Re: Major pgbench synthetic SELECT workload regression, Ubuntu 23.04+PG15

From: Gregory Smith <gregsmithpgsql(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Major pgbench synthetic SELECT workload regression, Ubuntu 23.04+PG15
Date: 2023-06-09 07:27:51
Message-ID: CAHLJuCX0NC7HOZPD-AOXjfQGE8j++sxXkLCcDkWecM_wMJoxzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Let me start with the happy ending to this thread:

$ pgbench -S -T 10 -c 32 -j 32 -M prepared -P 1 pgbench
pgbench (15.3 (Ubuntu 15.3-1.pgdg23.04+1))
progress: 1.0 s, 1015713.0 tps, lat 0.031 ms stddev 0.007, 0 failed
progress: 2.0 s, 1083780.4 tps, lat 0.029 ms stddev 0.007, 0 failed...
progress: 8.0 s, 1084574.1 tps, lat 0.029 ms stddev 0.001, 0 failed
progress: 9.0 s, 1082665.1 tps, lat 0.029 ms stddev 0.001, 0 failed
tps = 1077739.910163 (without initial connection time)

Which even seems a whole 0.9% faster than 14 on this hardware! The wonders
never cease.

On Thu, Jun 8, 2023 at 9:21 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> You might need to add --no-children to the perf report invocation,
> otherwise
> it'll show you the call graph inverted.
>

My problem was not writing kernel symbols out, I was only getting addresses
for some reason. This worked:

sudo perf record -g --call-graph dwarf -d --phys-data -a sleep 1
perf report --stdio

And once I looked at the stack trace I immediately saw the problem, fixed
the config option, and this report is now closed as PEBKAC on my part.
Somehow I didn't notice the 15 installs on both systems had
log_min_duration_statement=0, and that's why the performance kept dropping
*only* on the fastest runs.

What I've learned today then is that if someone sees osq_lock in simple
perf top out on oddly slow server, it's possible they are overloading a
device writing out log file data, and leaving out the boring parts the call
trace you might see is:

EmitErrorReport
__GI___libc_write
ksys_write
__fdget_pos
mutex_lock
__mutex_lock_slowpath
__mutex_lock.constprop.0
71.20% osq_lock

Everyone was stuck trying to find the end of the log file to write to it,
and that was the entirety of the problem. Hope that call trace and info
helps out some future goofball making the same mistake. I'd wager this
will come up again.

Thanks to everyone who helped out and I'm looking forward to PG16 testing
now that I have this rusty, embarrassing warm-up out of the way.

--
Greg Smith greg(dot)smith(at)crunchydata(dot)com
Director of Open Source Strategy

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Kefeder 2023-06-09 07:58:04 Re: GTIN14 support for contrib/isn
Previous Message Tom Lane 2023-06-09 06:13:45 Re: Error in calculating length of encoded base64 string