Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Date: 2014-11-26 20:45:12
Message-ID: 54763BD8.8020906@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 26.11.2014 21:07, Maxim Boguk wrote:
>
> On one of my servers stat collector went crasy on idle system.
> It constantly write around 50MB/s on idle database (no
> activity at all).
>
>
> OK there are results of some entry-level gdb kung-fu:
>
> problem point is:
> /*
> * Write the stats file if a new request has arrived
> that is not
> * satisfied by existing file.
> */
> if (last_statwrite < last_statrequest)
> pgstat_write_statsfile(false);
>
> on some loop iteration difference between last_statrequest and
> last_statwrite is around 23000-29000 (e.g. 20-30ms).
>
> Hm may be there some clock drift between CPU cores?
>
>
> What's made this situation interesting that the fact the code never got
> inside
> /*
> * If there is clock skew between backends and the collector, we
> could
> * receive a stats request time that's in the future. If so,
> complain
> * and reset last_statrequest. Resetting ensures that no inquiry
> * message can cause more than one stats file write to occur.
> */
> if (last_statrequest > last_statwrite)
> ...
> block.

That suggests there's no (significant) clock skew between postmaster and
the process requesting the file.

> Watchpoint on the (last_statwrite < last_statrequest) condition
> repetable fire ONLY inside:
> pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
> {
> if (msg->inquiry_time > last_statrequest)
> last_statrequest = msg->inquiry_time;
> }
> function.

OK, that's expected, because the request is sent only if the file is
older than X miliseconds (10ms for autovacuum, 640ms otherwise). So it
would be strange if the condition didn't fire here.

> Now just some musings/theory (I might be dead wrong of course).
> Lets suppose there are no clock skew actually and check execution flow.
>
> Main loop executed:
> if (last_statwrite < last_statrequest)
> pgstat_write_statsfile(false);
> and updated last_statwrite value.
> However, it took some time because of file size.

OK. FWIW, it's updated to GetCurrentTimestamp() before starting to write
the file.

> During that period (while pgstat busy with writes) some backend issued
> PGSTAT_MTYPE_INQUIRY request which will be succesfuly accepted on the
> next code lines at
> len = recv(pgStatSock, (char *) &msg,...
> And as a result the last_statrequest will be updated to the
> msg->inquiry_time value.
> And this value will be definitely > than last_statwrite (because this
> request issued after pgstat worker stared writing new copy of the stat
> file).

How many backends are there, requesting a fresh stats file? I thought
you mentioned the system is rather idle?

A regular backend won't request a fresh copy unless the current file is
older than 640 ms. So that can't cause ~50MB of writes (it'd require 10
writes per second). Maybe there are multiple backends sending such
requests, but I don't see how that could cause a continuous write load
(I'd understand a short spike at the beginning, and then getting
synchronized).

Autovacuum launcher / workers might cause that, because it's using 10 ms
threshold. But that'd require a lot of autovacuum processes running in
parallel, I guess.

I wonder if this could be caused by a long "queue" of such inquiries,
but that should be filtered out by the (last_statwrite <
last_statrequest) condition.

regards
Tomas

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Maxim Boguk 2014-11-26 21:19:32 Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Previous Message Jeff Janes 2014-11-26 20:29:59 Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-11-26 20:48:42 Re: bug in json_to_record with arrays
Previous Message Peter Eisentraut 2014-11-26 20:45:06 Re: [pgsql-packagers] Palle Girgensohn's ICU patch