Re: [PROPOSAL] VACUUM Progress Checker.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp
Cc: amitlangote09(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, pokurev(at)pm(dot)nttdata(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org, bannos(at)nttdata(dot)co(dot)jp
Subject: Re: [PROPOSAL] VACUUM Progress Checker.
Date: 2016-03-07 09:18:55
Message-ID: 20160307.181855.64000802.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Amit.

At Mon, 7 Mar 2016 16:16:30 +0900, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote in <56DD2ACE(dot)5050208(at)lab(dot)ntt(dot)co(dot)jp>
>
> Horiguchi-san,
>
> Thanks a lot for taking a look!
>
> On 2016/03/07 13:02, Kyotaro HORIGUCHI wrote:
> > At Sat, 5 Mar 2016 16:41:29 +0900, Amit Langote wrote:
> >> On Sat, Mar 5, 2016 at 4:24 PM, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> >>> So, I took the Vinayak's latest patch and rewrote it a little
> >> ...
> >>> I broke it into two:
> >>>
> >>> 0001-Provide-a-way-for-utility-commands-to-report-progres.patch
> >>> 0002-Implement-progress-reporting-for-VACUUM-command.patch
> >>
> >> Oops, unamended commit messages in those patches are misleading. So,
> >> please find attached corrected versions.
> >
> > The 0001-P.. adds the following interface functions.
> >
> > +extern void pgstat_progress_set_command(BackendCommandType cmdtype);
> > +extern void pgstat_progress_set_command_target(Oid objid);
> > +extern void pgstat_progress_update_param(int index, uint32 val);
> > +extern void pgstat_reset_local_progress(void);
> > +extern int pgstat_progress_get_num_param(BackendCommandType cmdtype);
> >
> > I don't like to treat the target object id differently from other
> > parameters. It could not be needed at all, or could be needed two
> > or more in contrast. Although oids are not guaranteed to fit
> > uint32, we have already stored BlockNumber there.
>
> I thought giving cmdtype and objid each its own slot would make things a
> little bit clearer than stuffing them into st_progress_param[0] and
> st_progress_param[1], respectively. Is that what you are suggesting?
> Although as I've don, a separate field st_command_objid may be a bit too much.

I mentioned only of object id as you seem to take me. The command
type is essential unlike the target object ids. It is needed by
all statistics views of this kind to filter required backends.

> If they are not special fields, I think we don't need special interface
> functions *set_command() and *set_command_target(). But I am still
> inclined toward keeping the former.
>
> > # I think that integer arrays might be needed to be passed as a
> > # parameter, but it would be the another issue.
>
> Didn't really think about it. Maybe we should consider a scenario that
> would require it.

Imagine to provide a statictics of a vacuum commnad as a
whole. It will vacuum several relations at once so the view could
be like the following.

select * from pg_stat_vacuum_command;
- [ Record 1 ]
worker_pid : 3243
command : vacuum full
rels_scheduled : {16387, 16390, 16393}
rels_finished : {16384}
status : Processing 16384, awiting for a lock.
..

This needs arrays if we want this but it would be another issue
as I said.

> > pg_stat_get_progress_info returns a tuple with 10 integer columns
> > (plus an object id). The reason why I suggested use of an integer
> > array is that it allows the API to serve arbitrary number of
> > parmeters without a modification of API, and array indexes are
> > coloreless than any concrete names. Howerver I don't stick to
> > that if we agree that it is ok to have fixed number of paremters.
>
> I think the fixed number of parameters in the form of a fixed-size array
> is because st_progress_param[] is part of a shared memory structure as
> discussed before. Although such interface has been roughly modeled on how
> pg_statistic catalog and pg_stats view or get_attstatsslot() function
> work, shared memory structures take the place of the catalog, so there are
> some restrictions (fixed size array being one).

It depends on how easy we take it to widen the parameter slots in
shared memory:p Anyway I don't stick that since it doesn't make
a siginificant difference.

> Regarding index into st_progress_param[], pgstat.c/pgstatfuncs.c should
> not bother what it is. As exemplified in patch 0002, individual index
> numbers can be defined as macros by individual command modules (suggested
> by Robert recently) with certain convention for readability such as the
> following in lazyvacuum.c:
>
> #define PROG_PAR_VAC_RELID 0
> #define PROG_PAR_VAC_PHASE_ID 1
> #define PROG_PAR_VAC_HEAP_BLKS 2
> #define PROG_PAR_VAC_CUR_HEAP_BLK 3
> ... so on.
>
> Then, to report a changed parameter:
>
> pgstat_progress_update_param(PROG_PAR_VAC_PHASE_ID, LV_PHASE_SCAN_HEAP);
> ...
> pgstat_progress_update_param(PROG_PAR_VAC_CUR_HEAP_BLK, blkno);

Yeah, it seems fine for me.

> by the way, following is proargnames[] for pg_stat_get_progress_info():
>
> cmdtype integer,
> OUT pid integer,
> OUT param1 integer,
> OUT param2 integer,
> ...
> OUT param10 integer
>
> So, it is a responsibility of a command specific progress view definition
> that it interprets values of param1..param10 appropriately. In fact, the
> implementer of the progress reporting for a command determines what goes
> into which slot of st_progress_param[], to begin with.

It seems quite fine, too.

> > pgstat_progress_get_num_param looks not good in the aspect of
> > genericity. I'd like to define it as an integer array by idexed
> > by the command type if it is needed. However it seems to me to be
> > enough that pg_stat_get_progress_info always returns 10 integers
> > regardless of what the numbers are for. The user sql function,
> > pg_stat_vacuum_progress as the first user, knows how many numbers
> > should be read for its work. It reads zeroes safely even if it
> > reads more than what the producer side offered (unless it tries
> > to divide something with it).
>
> Thinking a bit, perhaps we don't need num_param(cmdtpye) function or array
> at all as you seem to suggest. It serves no useful purpose now that I see

^^; Sorry for the cryptic description..

> it. pg_stat_get_progress_info() should simply copy
> st_progress_param[0...PG_STAT_GET_PROGRESS_COLS-1] to the result and view
> definer knows what's what.
>
> Attached updated patches which incorporate above mentioned changes. If
> Vinayak has something else in mind about anything, he can weigh in.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kouhei Kaigai 2016-03-07 09:32:53 Re: Way to check whether a particular block is on the shared_buffer?
Previous Message Tatsuo Ishii 2016-03-07 09:06:43 Re: How can we expand PostgreSQL ecosystem?