Re: Proposal for disk quota feature

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Hubert Zhang <hzhang(at)pivotal(dot)io>
Cc: Chapman Flack <chap(at)anastigmatix(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal for disk quota feature
Date: 2018-09-03 04:05:40
Message-ID: CAFj8pRCg4s5NW3ECsudQRhib=gA3J=rex11KYogfFJeLwNTrTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2018-09-03 3:49 GMT+02:00 Hubert Zhang <hzhang(at)pivotal(dot)io>:

> Thanks Pavel.
> Your patch did enforcement on storage level(md.c or we could also use
> smgr_extend). It's straight forward.
> But I prefer to implement disk_quota as a feature with following
> objectives:
> 1 set/alter disk quota setting on different database objects, e.g. user,
> database, schema etc. not only a general GUC, but we could set separate
> quota limit for a specific objects.
> 2 enforcement operator should work at two positions: before query is
> running and when query is running. The latter one's implementation maybe
> similar to your patch.
>

The patch was just example. The resource quotes should be more complex -
per partition, table, schema, database, user - so GUC are possible, but not
very user friendly.

Our case is specific, but not too much. The servers are used for
multidimensional analyses - and some tables can grow too fast (COPY, INSERT
SELECT). We need to solve limits immediately. The implementation is simple,
so I did it. Same implementation on database level, or schema level needs
some more locks, so it will not be too effective. The resource management
can be complex very complex, and I expect so it will be hard work.

Regards

Pavel

> On Sun, Sep 2, 2018 at 8:44 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
>> Hi
>>
>> 2018-09-02 14:18 GMT+02:00 Hubert Zhang <hzhang(at)pivotal(dot)io>:
>>
>>> Thanks Chapman.
>>> @Pavel, could you please explain more about your second suggestion "implement
>>> some quotas on storage level?"
>>>
>>
>> See attached patch - it is very simple - and good enough for our
>> purposes.
>>
>> Regards
>>
>> Pavel
>>
>>
>>
>>> We will not keep the long-lived processes attach to all databases(just
>>> like you mentioned servers with thousands of databases)
>>> And you are right, we could share ideas with autovacuum process, fork
>>> worker processes in need.
>>> "autovacuum checks for tables that have had a large number of inserted,
>>> updated or deleted tuples. These checks use the statistics collection
>>> facility"
>>> diskquota process is similar to autovacuum at caring about insert, but
>>> the difference is that it also care about vucuum full, truncate and drop.
>>> While update and delete may not be interested since no file change happens.
>>> So a separate diskquota process is preferred.
>>>
>>> So if we implemented disk quota as a full native feature, and in the
>>> first initial version I prefer to implement the following features:
>>> 1 Fork diskquota launcher process under Postmaster serverloop, which is
>>> long-lived.
>>> 2 Diskquota launcher process is responsible for creating diskquota
>>> worker process for every database.
>>> 3 DIskquota setting is stored in a separate catalog table for each
>>> database.
>>> 4 Initialization stage, Diskquota launcher process creates diskquota worker
>>> process for all the databases(traverse like autovacuum). Worker process
>>> calculates disk usage of db objects and their diskquota setting. If any
>>> db object exceeds its quota limit, put them into the blacklist in the
>>> shared memory, which will later be used by enforcement operator. Worker
>>> process exits when works are done.
>>> 5 Running stage, Diskquota launcher process creates diskquota worker
>>> process for the database with a large number of insert, copy, truncate,
>>> drop etc. or create disk quota statement. Worker process updates the file
>>> size for db objects containing the result relation, and compare with the
>>> diskquota setting. Again, if exceeds quota limit, put them into blacklist,
>>> remove from blacklist vice versa. Worker process exits when works are
>>> done and a GUC could control the frequency of worker process restart to a
>>> specific database. As you know, this GUC also controls the delay when we do
>>> enforcement.
>>> 6 Enforcement. When postgres backend executes queries, check the
>>> blacklist in shared memory to determine whether the query is allowed(before
>>> execute) or need rollback(is executing)?
>>>
>>> If we implemented disk quota as an extension, we could just use
>>> background worker to start diskquota launcher process and use
>>> RegisterDynamicBackgroundWorker() to fork child diskquota worker
>>> processes by the launcher process as suggested by @Chapman.
>>> Diskquota setting could be stored in user table in a separate schema for
>>> each database(Schema and table created by create extension statement) just
>>> like what Heikki has done in pg_quota project. But in this case, we need to
>>> create extension for each database before diskquota worker process can be
>>> set up for that database.
>>>
>>> Any comments on the above design and which is preferred, native feature
>>> or extension as the POC?
>>>
>>>
>>> -- Hubert
>>>
>>>
>>>
>>> On Fri, Aug 31, 2018 at 3:32 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
>>> wrote:
>>>
>>>>
>>>>
>>>> 2018-08-30 16:22 GMT+02:00 Chapman Flack <chap(at)anastigmatix(dot)net>:
>>>>
>>>>> On 08/30/2018 09:57 AM, Hubert Zhang wrote:
>>>>>
>>>>> > 2 Keep one worker process for each database. But using a
>>>>> parent/global
>>>>> > quota worker process to manage the lifecycle of database level worker
>>>>> > processes. It could handle the newly created database(avoid restart
>>>>> > database) and save resource when a database is not used. But this
>>>>> needs to
>>>>> > change worker process to be hierarchical. Postmaster becomes the
>>>>> grandfather
>>>>> > of database level worker processes in this case.
>>>>>
>>>>> I am using background workers this way in 9.5 at $work.
>>>>>
>>>>> In my case, one worker lives forever, wakes up on a set period, and
>>>>> starts a short-lived worker for every database, waiting for each
>>>>> one before starting the next.
>>>>>
>>>>> It was straightforward to implement. Looking back over the code,
>>>>> I see the global worker assigns its own PID to worker.bgw_notify_pid
>>>>> of each of its children, and also obtains a handle for each child
>>>>> from RegisterDynamicBackgroundWorker().
>>>>>
>>>>> I imagine the global quota worker would prefer to start workers
>>>>> for every database and then just wait for notifications from any
>>>>> of them, but that seems equally straightforward at first glance.
>>>>>
>>>>
>>>> There are servers with thousands databases. Worker per database is not
>>>> good idea.
>>>>
>>>> It should to share ideas, code with autovacuum process.
>>>>
>>>> Not sure, how to effective implementation based on bg workers can be.
>>>> On servers with large set of databases, large set of tables it can identify
>>>> too big table too late.
>>>>
>>>> Isn't better to implement some quotas on storage level?
>>>>
>>>> Regards
>>>>
>>>> Pavel
>>>>
>>>>
>>>>
>>>>> -Chap
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks
>>>
>>> Hubert Zhang
>>>
>>
>>
>
>
> --
> Thanks
>
> Hubert Zhang
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-09-03 04:51:12 Re: pg_verify_checksums failure with hash indexes
Previous Message Amit Kapila 2018-09-03 03:07:44 Re: pg_verify_checksums failure with hash indexes