Re: debug a lockup

From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Aislan Luiz Wendling <aislanluiz(at)hotmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: debug a lockup
Date: 2026-02-11 00:12:19
Message-ID: A724D8F7-D1F2-401E-8307-618AAF5B2A13@elevated-dev.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-admin

OK, we figured it out--I think.

pgbench was stuck in restart_syscall(<...resuming interrupted read...

it was set to open 100 connections

there were ~20 pg sessions in idle, and the last one (highest pid) in auth

that one was in write to fd 2

So... This is running in kubernetes. I was doing some load testing against a storage service (thus 100 connections). PG was launched manually in a bash session connected to the pod, in k9s. There were ~20 total bash sessions open in k9s across 15 nodes.

Theory: k9s glitched and stopped reading the piped file descriptor, buffer filled, and PG blocked on the write. (I have seen prior evidence of less-than-perfect handling of output by k9s). Particularly, I had logging of connections on, so at auth it would have been writing to stderr.

This happened in one of probably over 100 runs of the same test, so not readily reproducible and I wanted to autopsy it before killing off the hung processes. Unless someone pokes a hole in my theory, at this point I think it is neither pgbench nor PG nor Pure/Portworx at fault.

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
https://www.linkedin.com/in/scottribe/

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Edwin UY 2026-02-15 21:19:22 has_table_privilege
Previous Message Aislan Luiz Wendling 2026-02-11 00:00:43 Re: debug a lockup