Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system
Date: 2013-02-16 15:41:56
Message-ID: 511FA8C4.5090406@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15.2.2013 01:02, Tomas Vondra wrote:
> On 14.2.2013 22:24, Alvaro Herrera wrote:
>> Alvaro Herrera escribió:
>>> Here's a ninth version of this patch. (version 8 went unpublished). I
>>> have simplified a lot of things and improved some comments; I think I
>>> understand much of it now. I think this patch is fairly close to
>>> committable, but one issue remains, which is this bit in
>>> pgstat_write_statsfiles():
>>
>> I've marked this as Waiting on author for the time being. I'm going to
>> review/work on other patches now, hoping that Tomas will post an updated
>> version in time for it to be considered for 9.3.
>
> Sadly I have no idea how to fix that, and I think the solution you
> suggested in the previous messages does not actually do the trick :-(

I've been thinking about this (actually I had a really weird dream about
it this night) and I think it might work like this:

(1) check the timestamp of the global file -> if it's too old, we need
to send an inquiry or wait a bit longer

(2) if it's new enough, we need to read it a look for that particular
database - if it's not found, we have no info about it yet (this is
the case handled by the dummy files)

(3) if there's a database stat entry, we need to check the timestamp
when it was written for the last time -> if it's too old, send an
inquiry and wait a bit longer

(4) well, we have a recent global file, it contains the database stat
entry and it's fresh enough -> tadaaaaaa, we're done

At least that's my idea - I haven't tried to implement it yet.

I see a few pros and cons of this approach:

pros:

* no dummy files
* no timestamps in the per-db files (and thus no sync issues)

cons:

* the backends / workers will have to re-read the global file just to
check that the per-db file is there and is fresh enough

So far it was sufficient just to peek at the timestamp at the beginning
of the per-db stat file - minimum data read, no CPU-expensive processing
etc. Sadly the more DBs there are, the larger the file get (thus more
overhead to read it).

OTOH it's not that much data (~180B per entry, so with a 1000 of dbs
it's just ~180kB) so I don't expect this to be a tremendous issue. And
the pros seem to be quite compelling.

Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-02-16 16:35:01 Re: BUG #7873: pg_restore --clean tries to drop tables that don't exist
Previous Message Ants Aasma 2013-02-16 08:40:57 Re: 9.2.3 crashes during archive recovery