Re: Control your disk usage in PG: Introduction to Disk Quota Extension

From: Hubert Zhang <hzhang(at)pivotal(dot)io>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)pivotal(dot)io>
Subject: Re: Control your disk usage in PG: Introduction to Disk Quota Extension
Date: 2019-01-30 02:26:52
Message-ID: CAB0yre=PbBY-obEFqmkLpSvL=VRx_D3DPsnBn7kAKOah8Z-aVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Michael, Robert
For you question about the hook position, I want to explain more about the
background why we want to introduce these hooks.
We wrote a diskquota extension <https://github.com/greenplum-db/diskquota>
for Postgresql(which is inspired by Heikki's pg_quota
<https://github.com/hlinnaka/pg_quota>). Diskquota extension is used to
control the disk usage in Postgresql in a fine-grained way, which means:
1. You could set disk quota limit at schema level or role level.
2. A background worker will gather the current disk usage for each
schema/role in realtime.
3. A background worker will generate the blacklist for schema/role whose
quota limit is exceeded.
4. New transaction want to insert data into the schema/role in the
blacklist will be cancelled.

In step 2, gathering the current disk usage for each schema needs to sum
disk size of all the tables in this schema. This is a time consuming
operation. We want to use hooks in SMGR to detect the Active Table, and
only recalculate the disk size of all the Active Tables.
For example, the smgrextend hook indicates that you allocate a new block
and the table need to be treated as Active Table.

Do you have some better hook positions recommend to solve the above user
case?
Thanks in advance.

Hubert

On Tue, Jan 22, 2019 at 12:08 PM Hubert Zhang <hzhang(at)pivotal(dot)io> wrote:

> > For this particular purpose, I don't immediately see why you need a
>> > hook in both places. If ReadBuffer is called with P_NEW, aren't we
>> > guaranteed to end up in smgrextend()?
>> Yes, that's a bit awkward.
>
>
> Hi Michael, we revisit the ReadBuffer hook and remove it in the latest
> patch.
> ReadBuffer hook is original used to do enforcement(e.g. out of diskquota
> limit) when query is loading data.
> We plan to put the enforcement work of running query to separate diskquota
> worker process.
> Let worker process to detect the backends to be cancelled and send SIGINT
> to these backends.
> So there is no need for ReadBuffer hook anymore.
>
> Our patch currently only contains smgr related hooks to catch the file
> change and get the Active Table list for diskquota extension.
>
> Thanks Hubert.
>
>
> On Mon, Jan 7, 2019 at 6:56 PM Haozhou Wang <hawang(at)pivotal(dot)io> wrote:
>
>> Thanks very much for your comments.
>>
>> To the best of my knowledge, smgr is a layer that abstract the storage
>> operations. Therefore, it is a good place to control or collect information
>> the storage operations without touching the physical storage layer.
>> Moreover, smgr is coming with actual disk IO operation (not consider the
>> OS cache) for postgres. So we do not need to worry about the buffer
>> management in postgres.
>> It will make the purpose of hook is pure: a hook for actual disk IO.
>>
>> Regards,
>> Haozhou
>>
>> On Wed, Dec 26, 2018 at 1:56 PM Michael Paquier <michael(at)paquier(dot)xyz>
>> wrote:
>>
>>> On Wed, Nov 21, 2018 at 09:47:44AM -0500, Robert Haas wrote:
>>> > +1 for adding some hooks to support this kind of thing, but I think
>>> > the names you've chosen are not very good. The hook name should
>>> > describe the place from which it is called, not the purpose for which
>>> > one imagines that it will be used, because somebody else might imagine
>>> > another use. Both BufferExtendCheckPerms_hook_type and
>>> > SmgrStat_hook_type are imagining that they know what the hook does -
>>> > CheckPerms in the first case and Stat in the second case.
>>>
>>> I personally don't mind making Postgres more pluggable, but I don't
>>> think that we actually need the extra ones proposed here at the layer
>>> of smgr, as smgr is already a layer designed to call an underlying set
>>> of APIs able to extend, unlink, etc. depending on the storage type.
>>>
>>> > For this particular purpose, I don't immediately see why you need a
>>> > hook in both places. If ReadBuffer is called with P_NEW, aren't we
>>> > guaranteed to end up in smgrextend()?
>>>
>>> Yes, that's a bit awkward.
>>> --
>>> Michael
>>
>>
>
> --
> Thanks
>
> Hubert Zhang
>

--
Thanks

Hubert Zhang

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-01-30 02:28:26 Re: jsonpath
Previous Message Andres Freund 2019-01-30 02:21:27 Re: ALTER SESSION