Re: Control your disk usage in PG: Introduction to Disk Quota Extension

From: Hubert Zhang <hzhang(at)pivotal(dot)io>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)pivotal(dot)io>
Subject: Re: Control your disk usage in PG: Introduction to Disk Quota Extension
Date: 2019-02-18 06:39:04
Message-ID: CAB0yrekg=fMNpf6jXoxd3iGDCdf4JpRqqX2g9POUpm9g1PYbVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi Andres

On Sat, Feb 16, 2019 at 12:53 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
> On 2019-01-30 10:26:52 +0800, Hubert Zhang wrote:
> > Hi Michael, Robert
> > For you question about the hook position, I want to explain more about
> the
> > background why we want to introduce these hooks.
> > We wrote a diskquota extension <
> https://github.com/greenplum-db/diskquota>
> > [ ...]
> > On Tue, Jan 22, 2019 at 12:08 PM Hubert Zhang <hzhang(at)pivotal(dot)io> wrote:
> >
> > > > For this particular purpose, I don't immediately see why you need a
> > >> > hook in both places. If ReadBuffer is called with P_NEW, aren't we
> > >> > guaranteed to end up in smgrextend()?
> > >> Yes, that's a bit awkward.
>
> Please note that on PG lists the customary and desired style is to quote
> inline in messages rather than top-quote.
>
> Greetings,
>
> Andres Freund
>

Thanks for your note. I will reply the above questions again.

On Wed, Nov 21, 2018 at 10:48 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Nov 20, 2018 at 2:20 AM Haozhou Wang <hawang(at)pivotal(dot)io> wrote:
> > We prepared a patch that includes the hook points. And such hook points
> are needed for disk quota extension.
> > There are two hooks.
> > One is SmgrStat_hook. It's used to perform ad-hoc logic in storage when
> doing smgr create/extend/truncate in general. As for disk quota extension,
> this hook is used to detect active tables(new created tables, tables
> extending new blocks, or tables being truncated)
> > The other is BufferExtendCheckPerms_hook. It's used to perform ad-hoc
> logic when buffer extend a new block. Since ReadBufferExtended is a hot
> function, we call this hook only when blockNum == P_NEW. As for disk quota
> extension, this hook is used to do query enforcement during the query is
> loading data.
> >
> > Any comments are appreciated.
>
> +1 for adding some hooks to support this kind of thing, but I think
> the names you've chosen are not very good. The hook name should
> describe the place from which it is called, not the purpose for which
> one imagines that it will be used, because somebody else might imagine
> another use. Both BufferExtendCheckPerms_hook_type and
> SmgrStat_hook_type are imagining that they know what the hook does -
> CheckPerms in the first case and Stat in the second case.
>
> For this particular purpose, I don't immediately see why you need a
> hook in both places. If ReadBuffer is called with P_NEW, aren't we
> guaranteed to end up in smgrextend()?

We have removed the hook in ReadBufferExtended and only keep the hooks in
SMGR.

On Wed, Dec 26, 2018 at 1:56 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Wed, Nov 21, 2018 at 09:47:44AM -0500, Robert Haas wrote:
> > +1 for adding some hooks to support this kind of thing, but I think
> > the names you've chosen are not very good. The hook name should
> > describe the place from which it is called, not the purpose for which
> > one imagines that it will be used, because somebody else might imagine
> > another use. Both BufferExtendCheckPerms_hook_type and
> > SmgrStat_hook_type are imagining that they know what the hook does -
> > CheckPerms in the first case and Stat in the second case.
>
> I personally don't mind making Postgres more pluggable, but I don't
> think that we actually need the extra ones proposed here at the layer
> of smgr, as smgr is already a layer designed to call an underlying set
> of APIs able to extend, unlink, etc. depending on the storage type.
>
>
The reason to use smgr as the hook position is as follows:
These hooks will be used by diskquota extension, which needs to gather the
current disk usage for each schema. We want to use hooks to detect the
Active Tables, and only recalculate the disk size of all the Active Tables.
As you mentioned, smgr is the layer to call underlying API to extend/unlink
files. So it's also the place to detect the table size change, i.e. the
Active Tables.
For example, the smgrextend hook indicates that you allocate a new block
and the corresponding table needs to be treated as Active Table.

Besides, suppose we use smgr as hook position, I want to discuss the API
passed to the hook function.
Take smgrextend as example. The function interface of smgrextend is like
that:
```
void smgrextend
(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum, char *buffer,
bool skipFsync);
```
So the hook api of smgrextend could have two main options.
Hook API Option1
```
typedef void (*smgrextend_hook_type)
(RelFileNode smgr_rnode,ForkNumber forknum);
```
Hook API Option 2
```
typedef void (*smgrextend_hook_type)
(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,char *buffer,
bool skipFsync);
```

As for Option1. Since diskquota extension only needs the relfilenode
information to detect the active tables, Option1 just pass the RelFileNode
to the hook function. It's more clear and has semantic meaning.

While Option 2 is to pass the original parameters to the hook functions
without any filter. This is more general and let the different hook
implementations to decide how to use these parameters.

Option 1 also needs some additional work to handle smgrdounlinkall case,
whose input parameter is the SMgrRelation list. We may need to palloc
Relfilenode list and pfree it manually.
smgrdounlinkall function interface:
```
smgrdounlinkall(SMgrRelation *rels, int nrels, bool isRedo, char
*relstorages)
```

Based on the assumption we use smgr as hook position, hook API option1 or
option2 which is better?
Or we could find some balanced API between option1 and option2?

Again comments on other better hook positions are also appreciated!

--
Thanks

Hubert Zhang

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Matsumura, Ryo 2019-02-18 06:53:39 RE: [PROPOSAL]a new data type 'bytea' for ECPG
Previous Message Andrey Klychkov 2019-02-18 06:20:10 PGAdmin 4 don't refresh server info after restarting