Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Date: 2015-01-20 16:08:16
Message-ID: 54BE7D70.7050606@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25.12.2014 22:28, Tomas Vondra wrote:
> On 25.12.2014 21:14, Andres Freund wrote:
>
>> That's indeed odd. Seems to have been lost when the statsfile was
>> split into multiple files. Alvaro, Tomas?
>
> The goal was to keep the logic as close to the original as possible.
> IIRC there were "pgstat wait timeout" issues before, and in most cases
> the conclusion was that it's probably because of overloaded I/O.
>
> But maybe there actually was another bug, and it's entirely possible
> that the split introduced a new one, and that's what we're seeing now.
> The strange thing is that the split happened ~2 years ago, which is
> inconsistent with the sudden increase of this kind of issues. So maybe
> something changed on that particular animal (a failing SD card causing
> I/O stalls, perhaps)?
>
> Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
> and analyze the issue locally. But that won't happen until January.

I've tried to reproduce this on my Raspberry PI 'machine' and it's not
very difficult to trigger this. About 7 out of 10 'make check' runs fail
because of 'pgstat wait timeout'.

All the occurences I've seen were right after some sort of VACUUM
(sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time
looked something like this:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
mmcblk0 0.00 75.00 0.00 8.00 0.00 36.00
9.00 5.73 15633.75 0.00 15633.75 125.00 100.00

So pretty terrible (this is a Class 4 SD card, supposedly able to handle
4 MB/s). If hamster had faulty SD card, it might have been much worse, I
guess.

This of course does not prove the absence of a bug - I plan to dig into
this a bit more. Feel free to point out some suspicious scenarios that
might be worth reproducing and analyzing.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-01-20 16:08:54 Re: Merging postgresql.conf and postgresql.auto.conf
Previous Message Robert Haas 2015-01-20 15:54:25 Re: B-Tree support function number 3 (strxfrm() optimization)