Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
Date: 2013-02-04 02:21:18
Message-ID: 510F1B1E.8080307@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1.2.2013 17:19, Alvaro Herrera wrote:
> Tomas Vondra wrote:
>
>> diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
>> index be3adf1..4ec485e 100644
>> --- a/src/backend/postmaster/pgstat.c
>> +++ b/src/backend/postmaster/pgstat.c
>> @@ -64,10 +64,14 @@
>>
>> /* ----------
>> * Paths for the statistics files (relative to installation's $PGDATA).
>> + * Permanent and temprorary, global and per-database files.
>
> Note typo in the line above.
>
>> -#define PGSTAT_STAT_PERMANENT_FILENAME "global/pgstat.stat"
>> -#define PGSTAT_STAT_PERMANENT_TMPFILE "global/pgstat.tmp"
>> +#define PGSTAT_STAT_PERMANENT_DIRECTORY "stat"
>> +#define PGSTAT_STAT_PERMANENT_FILENAME "stat/global.stat"
>> +#define PGSTAT_STAT_PERMANENT_TMPFILE "stat/global.tmp"
>> +#define PGSTAT_STAT_PERMANENT_DB_FILENAME "stat/%d.stat"
>> +#define PGSTAT_STAT_PERMANENT_DB_TMPFILE "stat/%d.tmp"
>
>> +char *pgstat_stat_directory = NULL;
>> char *pgstat_stat_filename = NULL;
>> char *pgstat_stat_tmpname = NULL;
>> +char *pgstat_stat_db_filename = NULL;
>> +char *pgstat_stat_db_tmpname = NULL;
>
> I don't like the quoted parts very much; it seems awkward to have the
> snprintf patterns in one place and have them be used in very distant

I don't see that as particularly awkward, but that's a matter of taste.
I still see that as a bunch of constants that are sprintf patterns at
the same time.

> places. Is there a way to improve that? Also, if I understand clearly,
> the pgstat_stat_db_filename value needs to be an snprintf pattern too,
> right? What if it doesn't contain the required % specifier?

Ummmm, yes - it needs to be a pattern too, but the user specifies the
directory (stats_temp_directory) and this is used to derive all the
other values - see assign_pgstat_temp_directory() in guc.c.

> Also, if you can filter this through pgindent, that would be best. Make
> sure to add DBWriteRequest to src/tools/pgindent/typedefs_list.

Will do.

>
>> + /*
>> + * There's no request for this DB yet, so lets create it (allocate a
>> + * space for it, set the values).
>> + */
>> + if (last_statrequests == NULL)
>> + last_statrequests = palloc(sizeof(DBWriteRequest));
>> + else
>> + last_statrequests = repalloc(last_statrequests,
>> + (num_statrequests + 1)*sizeof(DBWriteRequest));
>> +
>> + last_statrequests[num_statrequests].databaseid = msg->databaseid;
>> + last_statrequests[num_statrequests].request_time = msg->clock_time;
>> + num_statrequests += 1;
>
> Having to repalloc this array each time seems wrong. Would a list
> instead of an array help? see ilist.c/h; I vote for a dlist because you
> can easily delete elements from the middle of it, if required (I think
> you'd need that.)

Thanks. I'm not very familiar with the list interface, so I've used
plain array. But yes, there are better ways than doing repalloc all the
time.

>
>> + char db_statfile[strlen(pgstat_stat_db_filename) + 11];
>> + snprintf(db_statfile, strlen(pgstat_stat_db_filename) + 11,
>> + pgstat_stat_filename, dbentry->databaseid);
>
> This pattern seems rather frequent. Can we use a macro or similar here?
> Encapsulating the "11" better would be good. Magic numbers are evil.

Yes, this needs to be cleaned / improved.

>> diff --git a/src/include/pgstat.h b/src/include/pgstat.h
>> index 613c1c2..b3467d2 100644
>> --- a/src/include/pgstat.h
>> +++ b/src/include/pgstat.h
>> @@ -205,6 +205,7 @@ typedef struct PgStat_MsgInquiry
>> PgStat_MsgHdr m_hdr;
>> TimestampTz clock_time; /* observed local clock time */
>> TimestampTz cutoff_time; /* minimum acceptable file timestamp */
>> + Oid databaseid; /* requested DB (InvalidOid => all DBs) */
>> } PgStat_MsgInquiry;
>
> Do we need to support the case that somebody requests stuff from the
> "shared" DB? IIRC that's what InvalidOid means in pgstat ...

Frankly, I don't know, but I guess we do because it was in the original
code, and there are such inquiries right after the database starts
(that's why I had to add pgstat_write_db_dummyfile).

Tomas

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2013-02-04 03:00:26 Re: lazy_vacuum_heap()'s removal of HEAPTUPLE_DEAD tuples
Previous Message Andrew Dunstan 2013-02-04 02:05:22 Re: json api WIP patch