Re: [PROPOSAL] VACUUM Progress Checker.

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: robertmhaas(at)gmail(dot)com, amitlangote09(at)gmail(dot)com, pokurev(at)pm(dot)nttdata(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org, bannos(at)nttdata(dot)co(dot)jp
Subject: Re: [PROPOSAL] VACUUM Progress Checker.
Date: 2016-03-09 02:01:06
Message-ID: 56DF83E2.3050105@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Horiguchi-san,

Thanks for the review!

On 2016/03/08 18:19, Kyotaro HORIGUCHI wrote:
>> Updated versions attached.
>>
>> * changed st_progress_param to int64 and so did the argument of
>> pgstat_progress_update_param(). Likewise changed param1..param10 of
>> pg_stat_get_progress_info()'s output columns to bigint.
>>
>> * Added back the Oid field st_command_target and corresponding function
>> pgstat_progress_set_command_target(Oid).
>
> + beentry->st_command = COMMAND_INVALID;
> + MemSet(&beentry->st_progress_param, 0, sizeof(beentry->st_progress_param));
>
> The MemSet seems useless since it gets the same initialization on
> setting st_command.

Right, every backend start should not have to pay that price. Fixed in
the latest version.

>
> + /*
> + * Report values for only those backends which are running the given
> + * command. XXX - privilege check is maybe dubious.
> + */
> + if (!beentry ||
> + beentry->st_command != cmdtype ||
> + !has_privs_of_role(GetUserId(), beentry->st_userid))
> + continue;
>
> We can simplly ignore unpriviledged backends, or all zeroz or
> nulls to signal that the caller has no priviledge.

As suggested by Robert, used pg_stat_get_activity() style. In this case,
show 'pid' to all but the rest only to role members.

> 0002
>
> + FROM pg_stat_get_progress_info(1) AS S;
>
> Ah... This magick number seems quite bad.. The function should
> take the command type in maybe string type.
>
> + FROM pg_stat_get_progress_info('lazy vacuum') AS S;
>
> Using an array of the names would be acceptable, maybe.
>
> | char *progress_command_names[] = {'lazy vacuum', NULL};

Hm, I think the way it is *may* be OK the way it is, but...

As done in the patch, the way we identify commands is with the enum
BackendCommandType:

+typedef enum BackendCommandType
+{
+ COMMAND_INVALID = 0,
+ COMMAND_LAZY_VACUUM
+} BackendCommandType;

Perhaps we could create a struct:

typedef struct PgStatProgressCommand
{
char *cmd_name;
BackendCommandType cmd_type;
} PgStatProgressCommand;

static const struct PgStatProgressCommand commands[] = {
{"vacuum", COMMAND_LAZY_VACUUM},
{NULL, COMMAND_INVALID}
};

> However the numbers for the phases ('scanning heap' and so..) is
> acceptable for me for reasons uncertain to me, it also could be
> represented in names but is might be rahter bothersome..

In initial versions of the patch, it used to be char * that was passed for
identifying phases. But, then we got rid of char * progress parameters
altogether. So, there are no longer any text columns in
pg_stat_get_progress_info()'s result. It may not work out well in long
run to forever not have those (your recent example comes to mind).

>
> + WHEN 0 THEN 100::numeric(5, 2)
> + ELSE ((S.param3 + 1)::numeric / S.param2 * 100)::numeric(5, 2)
>
> This usage of numeric seems overkill to me.

Hmm, how could this rather be written?

>> * I attempted to implement a method to report index blocks done from
>> lazy_tid_reaped() albeit with some limitations. Patch 0003 is that
>> attempt. In summary, I modified the index bulk delete callback interface
>> to receive a BlockNumber argument index_blkno:

[ snip ]

>> way of traversing the index pages. I dared only touch the btree case to
>> make it pass current block number to the callback. It finishes with
>> index_blks_done << total_index_blks since I guess the callback is called
>> only on the leaf pages. Any ideas?
>
> To the contrary, I suppose it counts one index page more than
> once for the cases of uncorrelated heaps. index_blks_vacuumd can
> exceed RelationGetNumberOfBlocks() in extreme cases. If I'm not
> missing something, it stands on a quite fragile graound.

Yeah, the method is not entirely foolproof yet.

>> * I am also tempted to add num_dead_tuples and dead_tuples_vacuumed to add
>> granularity to 'vacuuming heap' phase but didn't in this version. Should we?
>
> How do you think they are to be used?

I just realized there are objections to some columns be counters for pages
and others counting tuples. So, I guess I withdraw. I am just worried
that 'vacuuming heap' phase may take arbitrarily long if dead tuples array
is big. Since we were thinking of adding more granularity to 'vacuuming
indexes' phase, I thought we should do for the former too.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2016-03-09 02:12:31 Re: syslog configurable line splitting behavior
Previous Message Petr Jelinek 2016-03-09 01:56:36 Re: WIP: Access method extendability