Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Date: 2015-01-20 23:38:57
Message-ID: CAB7nPqRyOVYiLkrQkjrB1ozjoNaPBK_-ApV_8L+vnAKQHju=-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 21, 2015 at 1:08 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> On 25.12.2014 22:28, Tomas Vondra wrote:
>> On 25.12.2014 21:14, Andres Freund wrote:
>>
>>> That's indeed odd. Seems to have been lost when the statsfile was
>>> split into multiple files. Alvaro, Tomas?
>>
>> The goal was to keep the logic as close to the original as possible.
>> IIRC there were "pgstat wait timeout" issues before, and in most cases
>> the conclusion was that it's probably because of overloaded I/O.
>>
>> But maybe there actually was another bug, and it's entirely possible
>> that the split introduced a new one, and that's what we're seeing now.
>> The strange thing is that the split happened ~2 years ago, which is
>> inconsistent with the sudden increase of this kind of issues. So maybe
>> something changed on that particular animal (a failing SD card causing
>> I/O stalls, perhaps)?
>>
>> Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
>> and analyze the issue locally. But that won't happen until January.
>
> I've tried to reproduce this on my Raspberry PI 'machine' and it's not
> very difficult to trigger this. About 7 out of 10 'make check' runs fail
> because of 'pgstat wait timeout'.
>
> All the occurences I've seen were right after some sort of VACUUM
> (sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time
> looked something like this:
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
> avgrq-sz avgqu-sz await r_await w_await svctm %util
> mmcblk0 0.00 75.00 0.00 8.00 0.00 36.00
> 9.00 5.73 15633.75 0.00 15633.75 125.00 100.00
>
> So pretty terrible (this is a Class 4 SD card, supposedly able to handle
> 4 MB/s). If hamster had faulty SD card, it might have been much worse, I
> guess.
By experience, a class 10 is at least necessary, with a minimum amount
of memory to minimize the apparition of those warnings, hamster having
now a 8GB class 10 card.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2015-01-20 23:54:15 Re: Add min and max execute statement time in pg_stat_statement
Previous Message Peter Geoghegan 2015-01-20 23:34:49 Re: B-Tree support function number 3 (strxfrm() optimization)