Re: [rfc] overhauling pgstat.stat

From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Satoshi Nagayasu <snaga(at)uptime(dot)jp>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [rfc] overhauling pgstat.stat
Date: 2013-09-04 12:36:51
Message-ID: DEB493DE-7A5F-4D22-A175-B902CF3549D3@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sent from my iPad

On 04-Sep-2013, at 15:39, Satoshi Nagayasu <snaga(at)uptime(dot)jp> wrote:

> (2013/09/04 15:23), Atri Sharma wrote:
>>
>>
>> Sent from my iPad
>>
>> On 04-Sep-2013, at 10:54, Satoshi Nagayasu <snaga(at)uptime(dot)jp> wrote:
>>
>>> Hi,
>>>
>>> (2013/09/04 12:52), Atri Sharma wrote:
>>>> On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu <snaga(at)uptime(dot)jp> wrote:
>>>>> Hi,
>>>>>
>>>>> I'm considering overhauling pgstat.stat, and would like to know how many
>>>>> people are interested in this topic.
>>>>>
>>>>> As you may know, this file could be handreds of MB in size, because
>>>>> pgstat.stat holds all access statistics in each database, and it needs
>>>>> to read/write an entire pgstat.stat frequently.
>>>>>
>>>>> As a result, pgstat.stat often generates massive I/O operation,
>>>>> particularly when having a large number of tables in the database.
>>>>>
>>>>> To support multi-tenancy or just a large number of tables (up to 10k
>>>>> tables in single database), I think pgstat.stat needs to be overhauled.
>>>>>
>>>>> I think using heap and btree in pgstat.stat would be preferred to reduce
>>>>> read/write and to allow updating access statistics for specific tables
>>>>> in pgstat.stat file.
>>>>>
>>>>> Is this good for us?
>>>>
>>>> Hi,
>>>>
>>>> Nice thought. I/O reduction in pgstat can be really helpful.
>>>>
>>>> I am trying to think of our aim here. Would we be looking to split
>>>> pgstat per table, so that the I/O write happens for only a portion of
>>>> pgstat? Or reduce the I/O in general?
>>>
>>> I prefer the latter.
>>>
>>> Under the current implementation, DBA need to split single database
>>> into many smaller databases with considering access locality of the
>>> tables. It's difficult and could be change in future.
>>>
>>> And splitting the statistics data into many files (per table,
>>> for example) would cause another performance issue when
>>> collecting/showing statistics at once. Just my guess though.
>>>
>>> So, I'm looking for a new way to reduce I/O for the statistics data
>>> in general.
>>>
>>> Regards,
>>>
>>>>
>>>> If the later, how would using BTree help us? I would rather go for a
>>>> range tree or something. But again, I may be completely wrong.
>>>>
>>>> Please elaborate a bit more on the solution we are trying to
>>>> achieve.It seems really interesting.
>>>>
>>>> Regards,
>>>>
>>>> Atri
>>
>> Right,thanks.
>>
>> How would using heap and BTree help here? Are we looking at a priority queue which supports the main storage system of the stats?
>
> For example, when you read only a single block from your table,
> then you need to write all values in your database statistics next.
> It often generates large amount of i/o operation.
>
> However, if random access is allowed in the statistics, you can
> update only as single record for the specific table which you read.
> It would be less than 100 bytes for each table.
>
> I have no idea about how a priority queue can work here so far.
> However, if the statistics is overhauled, PostgreSQL would be able
> to host a much larger number of customers

Ah, now I get it. Thanks a ton for the detailed explanation.

Yes, a BTree will sufficiently isolate per table stats here and allow for random access.

Another thing I can think of is having a write back cache which could probably be used for a buffer before the actual stats write. I am just musing here though.

Regards,

Atri

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2013-09-04 12:41:44 Re: [rfc] overhauling pgstat.stat
Previous Message Stephen Frost 2013-09-04 12:13:17 Re: [rfc] overhauling pgstat.stat