Re: Proposal for disk quota feature

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Hubert Zhang <hzhang(at)pivotal(dot)io>
Cc: Chapman Flack <chap(at)anastigmatix(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal for disk quota feature
Date: 2018-09-22 03:17:14
Message-ID: CAFj8pRBvNDV8C_AGkYwJ5zGbTA+pEDHsXBRrnwZwWZsqJzSaLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang <hzhang(at)pivotal(dot)io> napsal:

> just fast reaction - why QUOTA object?
>> Isn't ALTER SET enough?
>> Some like
>> ALTER TABLE a1 SET quote = 1MB;
>> ALTER USER ...
>> ALTER SCHEMA ..
>> New DDL commans looks like too hard hammer .
>
>
> It's an option. Prefer to consider quota setting store together:
> CREATE DISK QUOTA way is more nature to store quota setting in a separate
> pg_diskquota catalog
> While ALTER SET way is more close to store quota setting in pg_class,
> pg_role, pg_namespace. etc in an integrated way.
> (Note that here I mean nature/close is not must, ALTER SET could also
> store in pg_diskquota and vice versa.)
>

I have not a problem with new special table for storing this information.
But it looks like redundant to current GUC configuration and limits. Can be
messy do some work with ALTER ROLE, and some work via CREATE QUOTE.

Regards

Pavel

> Here are some differences I can think of:
> 1 pg_role is a global catalog, not per database level. It's harder to
> tracker the user's disk usage in the whole clusters(considering 1000+
> databases). So the semantic of CREATE DISK QUOTA ON USER is limited: it
> only tracks the user's disk usage inside the current database.
> 2 using separate pg_diskquota could add more field except for quota limit
> without adding too many fields in pg_class, e.g. red zone to give the user
> a warning or the current disk usage of the db objects.
>
> On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
>>
>>
>> pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang(at)pivotal(dot)io>
>> napsal:
>>
>>>
>>>
>>>
>>>
>>> *Hi all,We redesign disk quota feature based on the comments from Pavel
>>> Stehule and Chapman Flack. Here are the new design.OverviewBasically, disk
>>> quota feature is used to support multi-tenancy environment, different level
>>> of database objects could be set a quota limit to avoid over use of disk
>>> space. A common case could be as follows: DBA could enable disk quota on a
>>> specified database list. DBA could set disk quota limit for
>>> tables/schemas/roles in these databases. Separate disk quota worker process
>>> will monitor the disk usage for these objects and detect the objects which
>>> exceed their quota limit. Queries loading data into these “out of disk
>>> quota” tables/schemas/roles will be cancelled.We are currently working at
>>> init implementation stage. We would like to propose our idea firstly and
>>> get feedbacks from community to do quick iteration.SQL Syntax (How to use
>>> disk quota)1 Specify the databases with disk quota enabled in GUC
>>> “diskquota_databases” in postgresql.conf and restart the database.2 DBA
>>> could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
>>> ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
>>> (quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
>>> ‘3MB’);*
>>>
>>
>> just fast reaction - why QUOTA object?
>>
>> Isn't ALTER SET enough?
>>
>> Some like
>>
>> ALTER TABLE a1 SET quote = 1MB;
>> ALTER USER ...
>> ALTER SCHEMA ..
>>
>> New DDL commans looks like too hard hammer .
>>
>>
>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *3 Simulate a schema out of quota limit case: suppose table a1 and table
>>> a2 are both under schema s1.INSERT INTO a1 SELECT
>>> generate_series(1,1000);INSERT INTO a2 SELECT
>>> generate_series(1,3000000);SELECT pg_sleep(5)INSERT INTO a1 SELECT
>>> generate_series(1,1000);ERROR: schema's disk space quota exceededDROP
>>> TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
>>> generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
>>> following components.1. Quota Setting Store is where the disk quota setting
>>> to be stored and accessed. We plan to use catalog table pg_diskquota to
>>> store these information. pg_diskquota is
>>> like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
>>> diskquota name */ int16 quotatype; /* diskquota type name */ Oid
>>> quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
>>> diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
>>> MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
>>> change of database objects. We plan to use stat collector to detect the
>>> ‘active’ table list at initial stage. But stat collector has some
>>> limitation on finding the active table which is in a running transaction.
>>> Details see TODO section.3. Quota Size Checker is where to calculate the
>>> size and compare with quota limit for database objects. According to
>>> Pavel’s comment, autovacuum launcher and worker process could be a good
>>> reference to disk quota. So we plan to use a disk quota launcher daemon
>>> process and several disk quota worker process to finish this work. Launcher
>>> process is responsible for starting worker process based on a user defined
>>> database list from GUC. Worker process will connect to its target database
>>> and monitor the disk usage for objects in this database. In init stage of
>>> worker process, it will call calculate_total_relation_size() to calculate
>>> the size for each user table. After init stage, worker process will refresh
>>> the disk model every N seconds. Refreshing will only recalculate the size
>>> of tables in ‘active’ table list, which is generated by Quata Change
>>> Detector to minimize the cost.4. Quota Enforcement Operator is where to
>>> check for the quota limitation at postgres backend side. We will firstly
>>> implement it in ExecCheckRTPerms() as pre-running enforcement. It will
>>> check the disk quota of tables being inserted or updated, and report error
>>> if table’s or table’s schema’s or table’s owner’s quota limit is exceeded.
>>> As a native feature, we plan to add more checkpoint to do running query
>>> enforcement. For example, if a disk quota lefts 10MB quota, a query could
>>> insert 1GB data. This query could be allowed in pre-running enforcement
>>> check, but will be cancelled in running query enforcement check. Therefore,
>>> it can improve the accurate of disk quota usage. To achieve this, we plan
>>> to add a checkpoint in lower API such as smgr_extened. Hence, the Quota
>>> Enforcement Operator will check the disk quota usage when smgr_extened is
>>> called. If the quota is over limited, current query will be cancelled.
>>> Highlight1. Native feature.Support native Create/Drop Disk Quota SQL
>>> statement.New catalog table pg_diskquota to store disk quota setting.2.
>>> Auto DML/DDL detection. Table
>>> create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,
>>> Schema create/drop and Role create/drop will be detected by disk quota
>>> automatically. 3. Low cost disk quota checker.Worker process of disk quota
>>> need to refresh the disk usage model every N seconds. Since recalculate the
>>> file size using stat() system call is expensive for a large number of
>>> files, we use an ‘active’ table list to reduce the real work at each
>>> iteration. A basic experiment on our init stage implementation on database
>>> with 20K tables shows that the refresh cost is 1% cpu usage and will be
>>> finished within 50ms. Todo/LimitationBefore we propose our patch, we plan
>>> to enhance it with the following ideas:1. Setting database list with disk
>>> quota enabled dynamically without restart database. Since we have the disk
>>> quota launcher process, it could detect the new ‘diskquota_databases’ list
>>> and start/stop the corresponding disk quota worker process.2. Enforcement
>>> when query is running. Considering the case when there is 10MB quota left,
>>> but next query will insert 10GB data. Current enforcement design will allow
>>> this query to be executed. This is limited by the ‘active’ table detection
>>> is generated by stat collector. Postgres backend will only send table stat
>>> information to collector only when the transaction ends. We need a new way
>>> to detect the ‘active’ table even when this table is being modified inside
>>> a running transaction.3. Monitor unlimited number of databases. Current we
>>> set the max number of disk quota worker process to be 10 to reduce the
>>> affection normal workload. But how about if we want to monitor the disk
>>> quota of more than 10 databases? Our solution is to let disk quota launcher
>>> to manage a queue of database need to be monitored. And disk quota worker
>>> process consuming the queue and refresh the disk usage/quota for this
>>> database. After some periods, worker will return the database to the queue,
>>> and fetch the top database from queue to process. The period determine the
>>> delay of detecting disk quota change. To implement this feature, we need to
>>> support a subprocess of postmaster to rebind to another database instead of
>>> the database binded in InitPostgres().4. Support active table detection on
>>> vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze
>>> are not tracked by stat collector.Thanks to Heikki, Pavel Stehule,Chapman
>>> Flack for the former comments on disk quota feature. Any comments on how to
>>> improve disk quota feature are appreciated.*
>>>
>>>
>>> On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
>>> wrote:
>>>
>>>>
>>>>
>>>> 2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang(at)pivotal(dot)io>:
>>>>
>>>>> Thanks Pavel.
>>>>> Your patch did enforcement on storage level(md.c or we could also use
>>>>> smgr_extend). It's straight forward.
>>>>> But I prefer to implement disk_quota as a feature with following
>>>>> objectives:
>>>>> 1 set/alter disk quota setting on different database objects, e.g.
>>>>> user, database, schema etc. not only a general GUC, but we could set
>>>>> separate quota limit for a specific objects.
>>>>> 2 enforcement operator should work at two positions: before query is
>>>>> running and when query is running. The latter one's implementation maybe
>>>>> similar to your patch.
>>>>>
>>>>
>>>> The patch was just example. The resource quotes should be more complex
>>>> - per partition, table, schema, database, user - so GUC are possible, but
>>>> not very user friendly.
>>>>
>>>> Our case is specific, but not too much. The servers are used for
>>>> multidimensional analyses - and some tables can grow too fast (COPY, INSERT
>>>> SELECT). We need to solve limits immediately. The implementation is simple,
>>>> so I did it. Same implementation on database level, or schema level needs
>>>> some more locks, so it will not be too effective. The resource management
>>>> can be complex very complex, and I expect so it will be hard work.
>>>>
>>>> Regards
>>>>
>>>> Pavel
>>>>
>>>>
>>>>> On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com
>>>>> > wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> 2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang(at)pivotal(dot)io>:
>>>>>>
>>>>>>> Thanks Chapman.
>>>>>>> @Pavel, could you please explain more about your second suggestion
>>>>>>> "implement some quotas on storage level?"
>>>>>>>
>>>>>>
>>>>>> See attached patch - it is very simple - and good enough for our
>>>>>> purposes.
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Pavel
>>>>>>
>>>>>>
>>>>>>
>>>>>>> We will not keep the long-lived processes attach to all
>>>>>>> databases(just like you mentioned servers with thousands of
>>>>>>> databases)
>>>>>>> And you are right, we could share ideas with autovacuum process,
>>>>>>> fork worker processes in need.
>>>>>>> "autovacuum checks for tables that have had a large number of
>>>>>>> inserted, updated or deleted tuples. These checks use the statistics
>>>>>>> collection facility"
>>>>>>> diskquota process is similar to autovacuum at caring about insert,
>>>>>>> but the difference is that it also care about vucuum full, truncate and
>>>>>>> drop. While update and delete may not be interested since no file change
>>>>>>> happens. So a separate diskquota process is preferred.
>>>>>>>
>>>>>>> So if we implemented disk quota as a full native feature, and in
>>>>>>> the first initial version I prefer to implement the following features:
>>>>>>> 1 Fork diskquota launcher process under Postmaster serverloop, which
>>>>>>> is long-lived.
>>>>>>> 2 Diskquota launcher process is responsible for creating diskquota
>>>>>>> worker process for every database.
>>>>>>> 3 DIskquota setting is stored in a separate catalog table for each
>>>>>>> database.
>>>>>>> 4 Initialization stage, Diskquota launcher process creates diskquota
>>>>>>> worker process for all the databases(traverse like autovacuum).
>>>>>>> Worker process calculates disk usage of db objects and
>>>>>>> their diskquota setting. If any db object exceeds its quota limit, put them
>>>>>>> into the blacklist in the shared memory, which will later be used by
>>>>>>> enforcement operator. Worker process exits when works are done.
>>>>>>> 5 Running stage, Diskquota launcher process creates diskquota worker
>>>>>>> process for the database with a large number of insert, copy, truncate,
>>>>>>> drop etc. or create disk quota statement. Worker process updates the file
>>>>>>> size for db objects containing the result relation, and compare with the
>>>>>>> diskquota setting. Again, if exceeds quota limit, put them into blacklist,
>>>>>>> remove from blacklist vice versa. Worker process exits when works
>>>>>>> are done and a GUC could control the frequency of worker process restart to
>>>>>>> a specific database. As you know, this GUC also controls the delay when we
>>>>>>> do enforcement.
>>>>>>> 6 Enforcement. When postgres backend executes queries, check the
>>>>>>> blacklist in shared memory to determine whether the query is allowed(before
>>>>>>> execute) or need rollback(is executing)?
>>>>>>>
>>>>>>> If we implemented disk quota as an extension, we could just use
>>>>>>> background worker to start diskquota launcher process and use
>>>>>>> RegisterDynamicBackgroundWorker() to fork child diskquota worker
>>>>>>> processes by the launcher process as suggested by @Chapman.
>>>>>>> Diskquota setting could be stored in user table in a separate schema for
>>>>>>> each database(Schema and table created by create extension statement) just
>>>>>>> like what Heikki has done in pg_quota project. But in this case, we need to
>>>>>>> create extension for each database before diskquota worker process can be
>>>>>>> set up for that database.
>>>>>>>
>>>>>>> Any comments on the above design and which is preferred, native
>>>>>>> feature or extension as the POC?
>>>>>>>
>>>>>>>
>>>>>>> -- Hubert
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <
>>>>>>> pavel(dot)stehule(at)gmail(dot)com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2018-08-30 16:22 GMT+02:00 Chapman Flack <chap(at)anastigmatix(dot)net>:
>>>>>>>>
>>>>>>>>> On 08/30/2018 09:57 AM, Hubert Zhang wrote:
>>>>>>>>>
>>>>>>>>> > 2 Keep one worker process for each database. But using a
>>>>>>>>> parent/global
>>>>>>>>> > quota worker process to manage the lifecycle of database level
>>>>>>>>> worker
>>>>>>>>> > processes. It could handle the newly created database(avoid
>>>>>>>>> restart
>>>>>>>>> > database) and save resource when a database is not used. But
>>>>>>>>> this needs to
>>>>>>>>> > change worker process to be hierarchical. Postmaster becomes the
>>>>>>>>> grandfather
>>>>>>>>> > of database level worker processes in this case.
>>>>>>>>>
>>>>>>>>> I am using background workers this way in 9.5 at $work.
>>>>>>>>>
>>>>>>>>> In my case, one worker lives forever, wakes up on a set period, and
>>>>>>>>> starts a short-lived worker for every database, waiting for each
>>>>>>>>> one before starting the next.
>>>>>>>>>
>>>>>>>>> It was straightforward to implement. Looking back over the code,
>>>>>>>>> I see the global worker assigns its own PID to
>>>>>>>>> worker.bgw_notify_pid
>>>>>>>>> of each of its children, and also obtains a handle for each child
>>>>>>>>> from RegisterDynamicBackgroundWorker().
>>>>>>>>>
>>>>>>>>> I imagine the global quota worker would prefer to start workers
>>>>>>>>> for every database and then just wait for notifications from any
>>>>>>>>> of them, but that seems equally straightforward at first glance.
>>>>>>>>>
>>>>>>>>
>>>>>>>> There are servers with thousands databases. Worker per database is
>>>>>>>> not good idea.
>>>>>>>>
>>>>>>>> It should to share ideas, code with autovacuum process.
>>>>>>>>
>>>>>>>> Not sure, how to effective implementation based on bg workers can
>>>>>>>> be. On servers with large set of databases, large set of tables it can
>>>>>>>> identify too big table too late.
>>>>>>>>
>>>>>>>> Isn't better to implement some quotas on storage level?
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> Pavel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> -Chap
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thanks
>>>>>>>
>>>>>>> Hubert Zhang
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks
>>>>>
>>>>> Hubert Zhang
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks
>>>
>>> Hubert Zhang
>>>
>>
>
> --
> Thanks
>
> Hubert Zhang
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2018-09-22 03:35:45 Re: [HACKERS] proposal: schema variables
Previous Message Chapman Flack 2018-09-22 02:26:02 Re: vary read_only in SPI calls? or poke at the on-entry snapshot?