Re: pgstat wait timeout

From: Steve Crawford <scrawford(at)pinpointresearch(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgstat wait timeout
Date: 2011-12-28 18:05:49
Message-ID: 4EFB5A7D.70904@pinpointresearch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/28/2011 09:34 AM, Alvaro Herrera wrote:
> Excerpts from Steve Crawford's message of mié dic 28 13:24:37 -0300 2011:
>> On 12/28/2011 05:05 AM, Alvaro Herrera wrote:
>>> Excerpts from Steve Crawford's message of mar dic 27 22:51:06 -0300 2011:
>>>> I have a system (9.0.4 on Ubuntu Server 10.04 LTS x86_64) that is
>>>> currently in test/dev mode. I'm currently seeing the following messages
>>>> occurring every few seconds:
>>>>
>>>> ...
>>>> Dec 27 17:43:22 foo postgres[23693]: [6-1] : WARNING: pgstat wait timeout
>>>> Dec 27 17:43:27 foo postgres[27324]: [71400-1] : WARNING: pgstat wait
>>>> timeout
>>>> Dec 27 17:43:33 foo postgres[23695]: [6-1] : WARNING: pgstat wait timeout
>>>> Dec 27 17:43:54 foo postgres[27324]: [71401-1] : WARNING: pgstat wait
>>>> timeout
>>> Hm, so can you strace the stats collector to see what it's doing? Maybe
>>> grab a backtrace with GDB from it before anything else.
>>>
>>> My guess is 27324 is the autovac launcher and the others are autovac
>>> workers just as they die.
>>>
>> You are correct. 27324 is the launcher and the others are autovac
>> workers. Here's the strace of the stats collector process:
>>
>> getppid() = 27320
>> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
>> getppid() = 27320
>> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
>> getppid() = 27320
>> poll([{fd=8, events=POLLIN|POLLERR}], 1, 2000) = 0 (Timeout)
>> ....rinse...lather...repeat...ad nauseum...
> Weird ... even across more "pgstat wait timeout" messages? It's like
> it's not getting the "inquiry" messages that would tell it to write the
> file ... something wrong with the UDP socket perhaps?
>
Bingo!

postgres 27325 postgres 8u *IPv6* 5379428
0t0 UDP localhost:47204->localhost:47204

In working on diagnosing a network timeout issue over an IPv4 to IPv4
VPN I disabled IPv6 via sysctl on this machine and pretty much forgot
about it since we are still IPv4 internally. But PostgreSQL had already
established a (now non-functional) IPv6 local connection. Re-enabling
IPv6, as it was not related to the VPN timeouts, corrected the "pgstat
wait timeout" issue.

Cheers,
Steve

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2011-12-28 18:12:48 Re: contrib/README
Previous Message Peter Eisentraut 2011-12-28 18:04:09 age(xid) on hot standby