Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
Date: 2013-02-04 00:51:59
Message-ID: 510F062F.1080001@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3.2.2013 21:54, Jeff Janes wrote:
> On Sat, Feb 2, 2013 at 2:33 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> On Sat, Jan 5, 2013 at 8:03 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>>> On 3.1.2013 20:33, Magnus Hagander wrote:
>>>>
>>>> Yeah, +1 for a separate directory not in global.
>>>
>>> OK, I moved the files from "global/stat" to "stat".
>>
>> This has a warning:
>>
>> pgstat.c:5132: warning: 'pgstat_write_statsfile_needed' was used with
>> no prototype before its definition
>>
>> I plan to do some performance testing, but that will take a while so I
>> wanted to post this before I get distracted.
>
> Running "vacuumdb -a" on a cluster with 1000 db with 200 tables (x
> serial primary key) in each, I get log messages like this:
>
> last_statwrite 23682-06-18 22:36:52.960194-07 is later than
> collector's time 2013-02-03 12:49:19.700629-08 for db 16387
>
> Note the bizarre year in the first time stamp.
>
> If it matters, I got this after shutting down the cluster, blowing
> away $DATA/stat/*, then restarting it and invoking vacuumdb.

I somehow expected that hash_search zeroes all the fields of a new
entry, but looking at pgstat_get_db_entry that obviously is not the
case. So stats_timestamp (which tracks timestamp of the last write for a
DB) was random - that's where the bizarre year values came from.

I've added a proper initialization (to 0), and now it works as expected.

Although the whole sequence of errors I was getting was this:

LOG: last_statwrite 11133-08-28 19:22:31.711744+02 is later than
collector's time 2013-02-04 00:54:21.113439+01 for db 19093
WARNING: pgstat wait timeout
LOG: last_statwrite 39681-12-23 18:48:48.9093+01 is later than
collector's time 2013-02-04 00:54:31.424681+01 for db 46494
FATAL: could not find block containing chunk 0x2af4a60
LOG: statistics collector process (PID 10063) exited with exit code 1
WARNING: pgstat wait timeout
WARNING: pgstat wait timeout

I'm not entirely sure where the FATAL came from, but it seems it was
somehow related to the issue - it was quite reproducible, although I
don't see how exactly could this happen. There relevant block of code
looks like this:

char *writetime;
char *mytime;

/* Copy because timestamptz_to_str returns a static buffer */
writetime = pstrdup(timestamptz_to_str(dbentry->stats_timestamp));
mytime = pstrdup(timestamptz_to_str(cur_ts));
elog(LOG, "last_statwrite %s is later than collector's time %s for "
"db %d", writetime, mytime, dbentry->databaseid);
pfree(writetime);
pfree(mytime);

which seems quite fine to mee. I'm not sure how one of the pfree calls
could fail?

Anyway, attached is a patch that fixes all three issues, i.e.

1) the un-initialized timestamp
2) the "void" ommited from the signature
3) rename to "pg_stat" instead of just "stat"

Tomas

Attachment Content-Type Size
stats-split-v6.patch text/x-diff 37.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-02-04 01:16:54 Re: cannot move relocatable extension out of pg_catalog schema
Previous Message Tomas Vondra 2013-02-04 00:50:33 Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system