Re: Auto-vacuum timing out and preventing connections

From: David Johansen <davejohansen(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Auto-vacuum timing out and preventing connections
Date: 2022-06-28 20:05:31
Message-ID: CAAcYxUfMtLYdhyyLdoP0LWn5J-dRjvuixQVs_HQi8Nn7cOeeSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Jun 28, 2022 at 1:31 PM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Mon, Jun 27, 2022 at 4:38 PM David Johansen <davejohansen(at)gmail(dot)com>
> wrote:
>
>> We're running into an issue where the database can't be connected to. It
>> appears that the auto-vacuum is timing out and then that prevents new
>> connections from happening. This assumption is based on these logs showing
>> up in the logs:
>> WARNING: worker took too long to start; canceled
>> The log appears about every 5 minutes and eventually nothing can connect
>> to it and it has to be rebooted.
>>
>
> As Julien suggested, this sounds like another victim, not the cause. Is
> there anything else in the log files?
>

That's the only thing in the logs for the 12-24 hours before the database
becomes inaccessible.

> What version are you using?
>

13.6

> These are the most similarly related previous posts, but the CPU usage
>> isn't high when this happens, so I don't believe that's the problem
>>
>> https://www.postgresql.org/message-id/20081105185206.GS4114%40alvh.no-ip.org
>>
>> https://www.postgresql.org/message-id/AANLkTinsGLeRc26RT5Kb4_HEhow5e97p0ZBveg=p9xqS@mail.gmail.com
>>
>
> But, I don't see high CPU described as a symptom in either of those
> threads.
>

I was referring to the "I've seen this happen under heavy load" statement.
Not sure that's the cause or related in those posts, but it doesn't appear
to be the issue here.

> If you can't reproduce the problem locally, there probably isn't much we
> can do. Maybe ask Amazon to look into it, since they are the only ones
> with sufficient access to do so.
>

We've opened a support case, but I was trying to be proactive and seeing
what we could dig into on our end. Is there a way to tell which table the
auto-vacuum is trying to run on and timing out with?

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2022-06-29 06:47:57 Re: BUG #17385: "RESET transaction_isolation" inside serializable transaction causes Assert at the transaction end
Previous Message Dave Cramer 2022-06-28 19:33:27 Re: BUG #17518: Getting Error "new multixact has more than one updating member" when trying to delete records.