| From: | Scott Ribe <scott_ribe(at)elevated-dev(dot)com> |
|---|---|
| To: | Aislan Luiz Wendling <aislanluiz(at)hotmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: debug a lockup |
| Date: | 2026-02-11 00:12:19 |
| Message-ID: | A724D8F7-D1F2-401E-8307-618AAF5B2A13@elevated-dev.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-admin |
OK, we figured it out--I think.
pgbench was stuck in restart_syscall(<...resuming interrupted read...
it was set to open 100 connections
there were ~20 pg sessions in idle, and the last one (highest pid) in auth
that one was in write to fd 2
So... This is running in kubernetes. I was doing some load testing against a storage service (thus 100 connections). PG was launched manually in a bash session connected to the pod, in k9s. There were ~20 total bash sessions open in k9s across 15 nodes.
Theory: k9s glitched and stopped reading the piped file descriptor, buffer filled, and PG blocked on the write. (I have seen prior evidence of less-than-perfect handling of output by k9s). Particularly, I had logging of connections on, so at auth it would have been writing to stderr.
This happened in one of probably over 100 runs of the same test, so not readily reproducible and I wanted to autopsy it before killing off the hung processes. Unless someone pokes a hole in my theory, at this point I think it is neither pgbench nor PG nor Pure/Portworx at fault.
--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
https://www.linkedin.com/in/scottribe/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Edwin UY | 2026-02-15 21:19:22 | has_table_privilege |
| Previous Message | Aislan Luiz Wendling | 2026-02-11 00:00:43 | Re: debug a lockup |