Re: Proposal for disk quota feature

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Hubert Zhang <hzhang(at)pivotal(dot)io>
Cc: Chapman Flack <chap(at)anastigmatix(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal for disk quota feature
Date: 2018-09-22 07:02:25
Message-ID: CAFj8pRDwn3z14C3P1QVL0qemQL5ky+Q6Rg6iy2tmw-Mpm0nz1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

so 22. 9. 2018 v 8:48 odesílatel Hubert Zhang <hzhang(at)pivotal(dot)io> napsal:

> But it looks like redundant to current GUC configuration and limits
>
> what do you mean by current GUC configuration? Is that the general block
> number limit in your patch? If yes, the difference between GUC and
> pg_diskquota catalog is that pg_diskquota will store different quota limit
> for the different role, schema or table instead of a single GUC value.
>

storage is not relevant in this moment.

I don't see to consistent to sets some limits via SET command, or ALTER X
SET, and some other with CREATE QUOTA ON.

The quotas or object limits, resource limits are pretty useful and
necessary, but I don't see these like new type of objects, it is much more
some property of current objects. Because we have one syntax for this
purpose I prefer it. Because is not good to have two syntaxes for similar
purpose.

So instead CREATE DISC QUATA ON SCHEMA xxx some value I prefer

ALTER SCHEMA xxx SET disc_quota = xxx;

The functionality is +/- same. But ALTER XX SET was introduce first, and I
don't feel comfortable to have any new syntax for similar purpose

Regards

Pavel

>
> On Sat, Sep 22, 2018 at 11:17 AM Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
>>
>>
>> pá 21. 9. 2018 v 16:21 odesílatel Hubert Zhang <hzhang(at)pivotal(dot)io>
>> napsal:
>>
>>> just fast reaction - why QUOTA object?
>>>> Isn't ALTER SET enough?
>>>> Some like
>>>> ALTER TABLE a1 SET quote = 1MB;
>>>> ALTER USER ...
>>>> ALTER SCHEMA ..
>>>> New DDL commans looks like too hard hammer .
>>>
>>>
>>> It's an option. Prefer to consider quota setting store together:
>>> CREATE DISK QUOTA way is more nature to store quota setting in a
>>> separate pg_diskquota catalog
>>> While ALTER SET way is more close to store quota setting in pg_class,
>>> pg_role, pg_namespace. etc in an integrated way.
>>> (Note that here I mean nature/close is not must, ALTER SET could also
>>> store in pg_diskquota and vice versa.)
>>>
>>
>> I have not a problem with new special table for storing this information.
>> But it looks like redundant to current GUC configuration and limits. Can be
>> messy do some work with ALTER ROLE, and some work via CREATE QUOTE.
>>
>> Regards
>>
>> Pavel
>>
>>
>>> Here are some differences I can think of:
>>> 1 pg_role is a global catalog, not per database level. It's harder to
>>> tracker the user's disk usage in the whole clusters(considering 1000+
>>> databases). So the semantic of CREATE DISK QUOTA ON USER is limited: it
>>> only tracks the user's disk usage inside the current database.
>>> 2 using separate pg_diskquota could add more field except for quota
>>> limit without adding too many fields in pg_class, e.g. red zone to give the
>>> user a warning or the current disk usage of the db objects.
>>>
>>> On Fri, Sep 21, 2018 at 8:01 PM Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
>>> wrote:
>>>
>>>>
>>>>
>>>> pá 21. 9. 2018 v 13:32 odesílatel Hubert Zhang <hzhang(at)pivotal(dot)io>
>>>> napsal:
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *Hi all,We redesign disk quota feature based on the comments from
>>>>> Pavel Stehule and Chapman Flack. Here are the new design.OverviewBasically,
>>>>> disk quota feature is used to support multi-tenancy environment, different
>>>>> level of database objects could be set a quota limit to avoid over use of
>>>>> disk space. A common case could be as follows: DBA could enable disk quota
>>>>> on a specified database list. DBA could set disk quota limit for
>>>>> tables/schemas/roles in these databases. Separate disk quota worker process
>>>>> will monitor the disk usage for these objects and detect the objects which
>>>>> exceed their quota limit. Queries loading data into these “out of disk
>>>>> quota” tables/schemas/roles will be cancelled.We are currently working at
>>>>> init implementation stage. We would like to propose our idea firstly and
>>>>> get feedbacks from community to do quick iteration.SQL Syntax (How to use
>>>>> disk quota)1 Specify the databases with disk quota enabled in GUC
>>>>> “diskquota_databases” in postgresql.conf and restart the database.2 DBA
>>>>> could set disk quota limit for table/schema/role.CREATE DISK QUOTA tablea1
>>>>> ON TABLE a1 with (quota = ‘1MB’);CREATE DISK QUOTA roleu1 ON USER u1 with
>>>>> (quota = ‘1GB’);CREATE DISK QUOTA schemas1 ON SCHEMA s1 with (quota =
>>>>> ‘3MB’);*
>>>>>
>>>>
>>>> just fast reaction - why QUOTA object?
>>>>
>>>> Isn't ALTER SET enough?
>>>>
>>>> Some like
>>>>
>>>> ALTER TABLE a1 SET quote = 1MB;
>>>> ALTER USER ...
>>>> ALTER SCHEMA ..
>>>>
>>>> New DDL commans looks like too hard hammer .
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *3 Simulate a schema out of quota limit case: suppose table a1 and
>>>>> table a2 are both under schema s1.INSERT INTO a1 SELECT
>>>>> generate_series(1,1000);INSERT INTO a2 SELECT
>>>>> generate_series(1,3000000);SELECT pg_sleep(5)INSERT INTO a1 SELECT
>>>>> generate_series(1,1000);ERROR: schema's disk space quota exceededDROP
>>>>> TABLE a2;SELECT pg_sleep(5)INSERT INTO a1 SELECT
>>>>> generate_series(1,1000);INSERT 0 1000ArchitectureDisk quota has the
>>>>> following components.1. Quota Setting Store is where the disk quota setting
>>>>> to be stored and accessed. We plan to use catalog table pg_diskquota to
>>>>> store these information. pg_diskquota is
>>>>> like:CATALOG(pg_diskquota,6122,DiskQuotaRelationId){ NameData quotaname; /*
>>>>> diskquota name */ int16 quotatype; /* diskquota type name */ Oid
>>>>> quotatargetoid; /* diskquota target db object oid*/ int32 quotalimit; /*
>>>>> diskquota size limit in MB*/ int32 quotaredzone; /* diskquota redzone in
>>>>> MB*/} FormData_pg_diskquota;2. Quota Change Detector is the monitor of size
>>>>> change of database objects. We plan to use stat collector to detect the
>>>>> ‘active’ table list at initial stage. But stat collector has some
>>>>> limitation on finding the active table which is in a running transaction.
>>>>> Details see TODO section.3. Quota Size Checker is where to calculate the
>>>>> size and compare with quota limit for database objects. According to
>>>>> Pavel’s comment, autovacuum launcher and worker process could be a good
>>>>> reference to disk quota. So we plan to use a disk quota launcher daemon
>>>>> process and several disk quota worker process to finish this work. Launcher
>>>>> process is responsible for starting worker process based on a user defined
>>>>> database list from GUC. Worker process will connect to its target database
>>>>> and monitor the disk usage for objects in this database. In init stage of
>>>>> worker process, it will call calculate_total_relation_size() to calculate
>>>>> the size for each user table. After init stage, worker process will refresh
>>>>> the disk model every N seconds. Refreshing will only recalculate the size
>>>>> of tables in ‘active’ table list, which is generated by Quata Change
>>>>> Detector to minimize the cost.4. Quota Enforcement Operator is where to
>>>>> check for the quota limitation at postgres backend side. We will firstly
>>>>> implement it in ExecCheckRTPerms() as pre-running enforcement. It will
>>>>> check the disk quota of tables being inserted or updated, and report error
>>>>> if table’s or table’s schema’s or table’s owner’s quota limit is exceeded.
>>>>> As a native feature, we plan to add more checkpoint to do running query
>>>>> enforcement. For example, if a disk quota lefts 10MB quota, a query could
>>>>> insert 1GB data. This query could be allowed in pre-running enforcement
>>>>> check, but will be cancelled in running query enforcement check. Therefore,
>>>>> it can improve the accurate of disk quota usage. To achieve this, we plan
>>>>> to add a checkpoint in lower API such as smgr_extened. Hence, the Quota
>>>>> Enforcement Operator will check the disk quota usage when smgr_extened is
>>>>> called. If the quota is over limited, current query will be cancelled.
>>>>> Highlight1. Native feature.Support native Create/Drop Disk Quota SQL
>>>>> statement.New catalog table pg_diskquota to store disk quota setting.2.
>>>>> Auto DML/DDL detection. Table
>>>>> create/update/insert/delete/vacuum/truncate/drop/schema_change/owner_change,
>>>>> Schema create/drop and Role create/drop will be detected by disk quota
>>>>> automatically. 3. Low cost disk quota checker.Worker process of disk quota
>>>>> need to refresh the disk usage model every N seconds. Since recalculate the
>>>>> file size using stat() system call is expensive for a large number of
>>>>> files, we use an ‘active’ table list to reduce the real work at each
>>>>> iteration. A basic experiment on our init stage implementation on database
>>>>> with 20K tables shows that the refresh cost is 1% cpu usage and will be
>>>>> finished within 50ms. Todo/LimitationBefore we propose our patch, we plan
>>>>> to enhance it with the following ideas:1. Setting database list with disk
>>>>> quota enabled dynamically without restart database. Since we have the disk
>>>>> quota launcher process, it could detect the new ‘diskquota_databases’ list
>>>>> and start/stop the corresponding disk quota worker process.2. Enforcement
>>>>> when query is running. Considering the case when there is 10MB quota left,
>>>>> but next query will insert 10GB data. Current enforcement design will allow
>>>>> this query to be executed. This is limited by the ‘active’ table detection
>>>>> is generated by stat collector. Postgres backend will only send table stat
>>>>> information to collector only when the transaction ends. We need a new way
>>>>> to detect the ‘active’ table even when this table is being modified inside
>>>>> a running transaction.3. Monitor unlimited number of databases. Current we
>>>>> set the max number of disk quota worker process to be 10 to reduce the
>>>>> affection normal workload. But how about if we want to monitor the disk
>>>>> quota of more than 10 databases? Our solution is to let disk quota launcher
>>>>> to manage a queue of database need to be monitored. And disk quota worker
>>>>> process consuming the queue and refresh the disk usage/quota for this
>>>>> database. After some periods, worker will return the database to the queue,
>>>>> and fetch the top database from queue to process. The period determine the
>>>>> delay of detecting disk quota change. To implement this feature, we need to
>>>>> support a subprocess of postmaster to rebind to another database instead of
>>>>> the database binded in InitPostgres().4. Support active table detection on
>>>>> vacuum full and vacuum analyze. Currently vacuum full and vacuum analyze
>>>>> are not tracked by stat collector.Thanks to Heikki, Pavel Stehule,Chapman
>>>>> Flack for the former comments on disk quota feature. Any comments on how to
>>>>> improve disk quota feature are appreciated.*
>>>>>
>>>>>
>>>>> On Mon, Sep 3, 2018 at 12:05 PM, Pavel Stehule <
>>>>> pavel(dot)stehule(at)gmail(dot)com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> 2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang(at)pivotal(dot)io>:
>>>>>>
>>>>>>> Thanks Pavel.
>>>>>>> Your patch did enforcement on storage level(md.c or we could also
>>>>>>> use smgr_extend). It's straight forward.
>>>>>>> But I prefer to implement disk_quota as a feature with following
>>>>>>> objectives:
>>>>>>> 1 set/alter disk quota setting on different database objects, e.g.
>>>>>>> user, database, schema etc. not only a general GUC, but we could set
>>>>>>> separate quota limit for a specific objects.
>>>>>>> 2 enforcement operator should work at two positions: before query is
>>>>>>> running and when query is running. The latter one's implementation maybe
>>>>>>> similar to your patch.
>>>>>>>
>>>>>>
>>>>>> The patch was just example. The resource quotes should be more
>>>>>> complex - per partition, table, schema, database, user - so GUC are
>>>>>> possible, but not very user friendly.
>>>>>>
>>>>>> Our case is specific, but not too much. The servers are used for
>>>>>> multidimensional analyses - and some tables can grow too fast (COPY, INSERT
>>>>>> SELECT). We need to solve limits immediately. The implementation is simple,
>>>>>> so I did it. Same implementation on database level, or schema level needs
>>>>>> some more locks, so it will not be too effective. The resource management
>>>>>> can be complex very complex, and I expect so it will be hard work.
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Pavel
>>>>>>
>>>>>>
>>>>>>> On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <
>>>>>>> pavel(dot)stehule(at)gmail(dot)com> wrote:
>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> 2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang(at)pivotal(dot)io>:
>>>>>>>>
>>>>>>>>> Thanks Chapman.
>>>>>>>>> @Pavel, could you please explain more about your second
>>>>>>>>> suggestion "implement some quotas on storage level?"
>>>>>>>>>
>>>>>>>>
>>>>>>>> See attached patch - it is very simple - and good enough for our
>>>>>>>> purposes.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> Pavel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> We will not keep the long-lived processes attach to all
>>>>>>>>> databases(just like you mentioned servers with thousands of
>>>>>>>>> databases)
>>>>>>>>> And you are right, we could share ideas with autovacuum process,
>>>>>>>>> fork worker processes in need.
>>>>>>>>> "autovacuum checks for tables that have had a large number of
>>>>>>>>> inserted, updated or deleted tuples. These checks use the statistics
>>>>>>>>> collection facility"
>>>>>>>>> diskquota process is similar to autovacuum at caring about insert,
>>>>>>>>> but the difference is that it also care about vucuum full, truncate and
>>>>>>>>> drop. While update and delete may not be interested since no file change
>>>>>>>>> happens. So a separate diskquota process is preferred.
>>>>>>>>>
>>>>>>>>> So if we implemented disk quota as a full native feature, and in
>>>>>>>>> the first initial version I prefer to implement the following features:
>>>>>>>>> 1 Fork diskquota launcher process under Postmaster serverloop,
>>>>>>>>> which is long-lived.
>>>>>>>>> 2 Diskquota launcher process is responsible for
>>>>>>>>> creating diskquota worker process for every database.
>>>>>>>>> 3 DIskquota setting is stored in a separate catalog table for each
>>>>>>>>> database.
>>>>>>>>> 4 Initialization stage, Diskquota launcher process creates
>>>>>>>>> diskquota worker process for all the databases(traverse like
>>>>>>>>> autovacuum). Worker process calculates disk usage of db objects and
>>>>>>>>> their diskquota setting. If any db object exceeds its quota limit, put them
>>>>>>>>> into the blacklist in the shared memory, which will later be used by
>>>>>>>>> enforcement operator. Worker process exits when works are done.
>>>>>>>>> 5 Running stage, Diskquota launcher process creates diskquota worker
>>>>>>>>> process for the database with a large number of insert, copy, truncate,
>>>>>>>>> drop etc. or create disk quota statement. Worker process updates the file
>>>>>>>>> size for db objects containing the result relation, and compare with the
>>>>>>>>> diskquota setting. Again, if exceeds quota limit, put them into blacklist,
>>>>>>>>> remove from blacklist vice versa. Worker process exits when works
>>>>>>>>> are done and a GUC could control the frequency of worker process restart to
>>>>>>>>> a specific database. As you know, this GUC also controls the delay when we
>>>>>>>>> do enforcement.
>>>>>>>>> 6 Enforcement. When postgres backend executes queries, check the
>>>>>>>>> blacklist in shared memory to determine whether the query is allowed(before
>>>>>>>>> execute) or need rollback(is executing)?
>>>>>>>>>
>>>>>>>>> If we implemented disk quota as an extension, we could just use
>>>>>>>>> background worker to start diskquota launcher process and use
>>>>>>>>> RegisterDynamicBackgroundWorker() to fork child diskquota worker
>>>>>>>>> processes by the launcher process as suggested by @Chapman.
>>>>>>>>> Diskquota setting could be stored in user table in a separate schema for
>>>>>>>>> each database(Schema and table created by create extension statement) just
>>>>>>>>> like what Heikki has done in pg_quota project. But in this case, we need to
>>>>>>>>> create extension for each database before diskquota worker process can be
>>>>>>>>> set up for that database.
>>>>>>>>>
>>>>>>>>> Any comments on the above design and which is preferred, native
>>>>>>>>> feature or extension as the POC?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -- Hubert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <
>>>>>>>>> pavel(dot)stehule(at)gmail(dot)com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2018-08-30 16:22 GMT+02:00 Chapman Flack <chap(at)anastigmatix(dot)net>:
>>>>>>>>>>
>>>>>>>>>>> On 08/30/2018 09:57 AM, Hubert Zhang wrote:
>>>>>>>>>>>
>>>>>>>>>>> > 2 Keep one worker process for each database. But using a
>>>>>>>>>>> parent/global
>>>>>>>>>>> > quota worker process to manage the lifecycle of database level
>>>>>>>>>>> worker
>>>>>>>>>>> > processes. It could handle the newly created database(avoid
>>>>>>>>>>> restart
>>>>>>>>>>> > database) and save resource when a database is not used. But
>>>>>>>>>>> this needs to
>>>>>>>>>>> > change worker process to be hierarchical. Postmaster becomes
>>>>>>>>>>> the grandfather
>>>>>>>>>>> > of database level worker processes in this case.
>>>>>>>>>>>
>>>>>>>>>>> I am using background workers this way in 9.5 at $work.
>>>>>>>>>>>
>>>>>>>>>>> In my case, one worker lives forever, wakes up on a set period,
>>>>>>>>>>> and
>>>>>>>>>>> starts a short-lived worker for every database, waiting for each
>>>>>>>>>>> one before starting the next.
>>>>>>>>>>>
>>>>>>>>>>> It was straightforward to implement. Looking back over the code,
>>>>>>>>>>> I see the global worker assigns its own PID to
>>>>>>>>>>> worker.bgw_notify_pid
>>>>>>>>>>> of each of its children, and also obtains a handle for each child
>>>>>>>>>>> from RegisterDynamicBackgroundWorker().
>>>>>>>>>>>
>>>>>>>>>>> I imagine the global quota worker would prefer to start workers
>>>>>>>>>>> for every database and then just wait for notifications from any
>>>>>>>>>>> of them, but that seems equally straightforward at first glance.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> There are servers with thousands databases. Worker per database
>>>>>>>>>> is not good idea.
>>>>>>>>>>
>>>>>>>>>> It should to share ideas, code with autovacuum process.
>>>>>>>>>>
>>>>>>>>>> Not sure, how to effective implementation based on bg workers can
>>>>>>>>>> be. On servers with large set of databases, large set of tables it can
>>>>>>>>>> identify too big table too late.
>>>>>>>>>>
>>>>>>>>>> Isn't better to implement some quotas on storage level?
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>>
>>>>>>>>>> Pavel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -Chap
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> Hubert Zhang
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thanks
>>>>>>>
>>>>>>> Hubert Zhang
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks
>>>>>
>>>>> Hubert Zhang
>>>>>
>>>>
>>>
>>> --
>>> Thanks
>>>
>>> Hubert Zhang
>>>
>>
>
> --
> Thanks
>
> Hubert Zhang
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-09-22 11:57:04 Re: Adding a note to protocol.sgml regarding CopyData
Previous Message Amit Kapila 2018-09-22 06:55:09 Re: pg_atomic_exchange_u32() in ProcArrayGroupClearXid()