Re: Pg stuck at 100% cpu, for multiple days

From: Joe Conway <mail(at)joeconway(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: depesz(at)depesz(dot)com, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pg stuck at 100% cpu, for multiple days
Date: 2021-08-31 00:15:24
Message-ID: 257d9bd3-6cd4-4307-2a6d-f78a5b9eba7d@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8/30/21 3:34 PM, Justin Pryzby wrote:
> On Mon, Aug 30, 2021 at 09:09:20PM +0200, Laurenz Albe wrote:
>> On Mon, 2021-08-30 at 17:18 +0200, hubert depesz lubaczewski wrote:
>> > The thing is - I can't close it with pg_terminate_backend(), and I'd
>> > rather not kill -9, as it will, I think, close all other connections,
>> > and this is prod server.
>>
>> Of course the cause should be fixed, but to serve your immediate need:
>
> You might save a coredump of the process using gdb gcore before killing it, in
> case someone thinks how to debug it next month.
>
> Depending on your OS, you might have to do something special to get shared
> buffers included in the dump (or excluded, if that's what's desirable).
>
> I wonder how far up the stacktrace it's stuck ?
> You could set a breakpoint on LogicalDecodingProcessRecord and then "c"ontinue,
> and see if it hits the breakpoint in a few seconds. If not, try the next
> frame until you know which one is being called repeatedly.
>
> Maybe CheckForInterrupts should be added somewhere...

The spot in the backtrace...

#0 hash_seq_search (status=status(at)entry=0xffffdd90f380) at
./build/../src/backend/utils/hash/dynahash.c:1448

...is in the middle of this while loop:
8<-----------------------------------------
while ((curElem = segp[segment_ndx]) == NULL)
{
/* empty bucket, advance to next */
if (++curBucket > max_bucket)
{
status->curBucket = curBucket;
hash_seq_term(status);
return NULL; /* search is done */
}
if (++segment_ndx >= ssize)
{
segment_num++;
segment_ndx = 0;
segp = hashp->dir[segment_num];
}
}
8<-----------------------------------------

It would be interesting to step through a few times to see if it is
really stuck in that loop. That would be consistent with 100% CPU and
not checking for interrupts I think.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-08-31 00:22:24 Re: Pg stuck at 100% cpu, for multiple days
Previous Message Tom Lane 2021-08-31 00:11:00 Re: Can we get rid of repeated queries from pg_dump?