Re: Control your disk usage in PG: Introduction to Disk Quota Extension

From: Hubert Zhang <hzhang(at)pivotal(dot)io>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)pivotal(dot)io>
Subject: Re: Control your disk usage in PG: Introduction to Disk Quota Extension
Date: 2019-01-30 12:34:27
Message-ID: CAB0yre=+hPosdOh=xwcNXbKWJyFFf11s2PK_aAAT6ArQvcnPRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

Currently we add hooks in SMGR to detect which table is being modified(disk
size change).
Inserting rows into existing page with room will not introduce new block,
and thus should not be treated as active table. smgrextend is a good
position to meet this behavior.
We welcome suggestions on other better hook positions!

Besides, suppose we use smgr as hook position, I want to discuss the API
passed to the hook function.
Take smgrextend as example. The function interface of smgrextend is like
that:
```
void smgrextend
(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, char *buffer,
bool skipFsync);
```
So the hook api of smgrextend could have two main options.
Hook API Option1
```
typedef void (*smgrextend_hook_type)
(RelFileNode smgr_rnode,ForkNumber forknum);
```
Hook API Option 2
```
typedef void (*smgrextend_hook_type)
(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,char *buffer,
bool skipFsync);
```

As for Option1. Since diskquota extension only needs the relfilenode
information to detect the active tables, Option1 just pass the RelFileNode
to the hook function. It's more clear and has semantic meaning.

While Option 2 is to pass the original parameters to the hook functions
without any filter. This is more general and let the different hook
implementations to decide how to use these parameters.

Option 1 also needs some additional work to handle smgrdounlinkall case,
whose input parameter is the SMgrRelation list. We may need to palloc
Relfilenode list and pfree it manually.
smgrdounlinkall function interface:
```
smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo, char
*relstorages)
```

Based on the assumption we use smgr as hook position, hook API option1 or
option2 which is better?
Or we could find some balanced API between option1 and option2?

Again comments on other better hook positions are appreciated!

Thanks
Hubert

On Wed, Jan 30, 2019 at 10:26 AM Hubert Zhang <hzhang(at)pivotal(dot)io> wrote:

> Hi Michael, Robert
> For you question about the hook position, I want to explain more about the
> background why we want to introduce these hooks.
> We wrote a diskquota extension <https://github.com/greenplum-db/diskquota>
> for Postgresql(which is inspired by Heikki's pg_quota
> <https://github.com/hlinnaka/pg_quota>). Diskquota extension is used to
> control the disk usage in Postgresql in a fine-grained way, which means:
> 1. You could set disk quota limit at schema level or role level.
> 2. A background worker will gather the current disk usage for each
> schema/role in realtime.
> 3. A background worker will generate the blacklist for schema/role whose
> quota limit is exceeded.
> 4. New transaction want to insert data into the schema/role in the
> blacklist will be cancelled.
>
> In step 2, gathering the current disk usage for each schema needs to sum
> disk size of all the tables in this schema. This is a time consuming
> operation. We want to use hooks in SMGR to detect the Active Table, and
> only recalculate the disk size of all the Active Tables.
> For example, the smgrextend hook indicates that you allocate a new block
> and the table need to be treated as Active Table.
>
> Do you have some better hook positions recommend to solve the above user
> case?
> Thanks in advance.
>
> Hubert
>
>
>
>
>
> On Tue, Jan 22, 2019 at 12:08 PM Hubert Zhang <hzhang(at)pivotal(dot)io> wrote:
>
>> > For this particular purpose, I don't immediately see why you need a
>>> > hook in both places. If ReadBuffer is called with P_NEW, aren't we
>>> > guaranteed to end up in smgrextend()?
>>> Yes, that's a bit awkward.
>>
>>
>> Hi Michael, we revisit the ReadBuffer hook and remove it in the latest
>> patch.
>> ReadBuffer hook is original used to do enforcement(e.g. out of diskquota
>> limit) when query is loading data.
>> We plan to put the enforcement work of running query to separate
>> diskquota worker process.
>> Let worker process to detect the backends to be cancelled and send SIGINT
>> to these backends.
>> So there is no need for ReadBuffer hook anymore.
>>
>> Our patch currently only contains smgr related hooks to catch the file
>> change and get the Active Table list for diskquota extension.
>>
>> Thanks Hubert.
>>
>>
>> On Mon, Jan 7, 2019 at 6:56 PM Haozhou Wang <hawang(at)pivotal(dot)io> wrote:
>>
>>> Thanks very much for your comments.
>>>
>>> To the best of my knowledge, smgr is a layer that abstract the storage
>>> operations. Therefore, it is a good place to control or collect information
>>> the storage operations without touching the physical storage layer.
>>> Moreover, smgr is coming with actual disk IO operation (not consider the
>>> OS cache) for postgres. So we do not need to worry about the buffer
>>> management in postgres.
>>> It will make the purpose of hook is pure: a hook for actual disk IO.
>>>
>>> Regards,
>>> Haozhou
>>>
>>> On Wed, Dec 26, 2018 at 1:56 PM Michael Paquier <michael(at)paquier(dot)xyz>
>>> wrote:
>>>
>>>> On Wed, Nov 21, 2018 at 09:47:44AM -0500, Robert Haas wrote:
>>>> > +1 for adding some hooks to support this kind of thing, but I think
>>>> > the names you've chosen are not very good. The hook name should
>>>> > describe the place from which it is called, not the purpose for which
>>>> > one imagines that it will be used, because somebody else might imagine
>>>> > another use. Both BufferExtendCheckPerms_hook_type and
>>>> > SmgrStat_hook_type are imagining that they know what the hook does -
>>>> > CheckPerms in the first case and Stat in the second case.
>>>>
>>>> I personally don't mind making Postgres more pluggable, but I don't
>>>> think that we actually need the extra ones proposed here at the layer
>>>> of smgr, as smgr is already a layer designed to call an underlying set
>>>> of APIs able to extend, unlink, etc. depending on the storage type.
>>>>
>>>> > For this particular purpose, I don't immediately see why you need a
>>>> > hook in both places. If ReadBuffer is called with P_NEW, aren't we
>>>> > guaranteed to end up in smgrextend()?
>>>>
>>>> Yes, that's a bit awkward.
>>>> --
>>>> Michael
>>>
>>>
>>
>> --
>> Thanks
>>
>> Hubert Zhang
>>
>
>
> --
> Thanks
>
> Hubert Zhang
>

--
Thanks

Hubert Zhang

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-01-30 13:11:01 Re: WIP: Avoid creation of the free space map for small tables
Previous Message Konstantin Knizhnik 2019-01-30 12:04:53 Replication & recovery_min_apply_delay