Re: [PROPOSAL] VACUUM Progress Checker.

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, rahilasyed90(at)gmail(dot)com, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Masao Fujii <masao(dot)fujii(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, pokurev(at)pm(dot)nttdata(dot)co(dot)jp, Vinayak Pokale <vinpokale(at)gmail(dot)com>
Subject: Re: [PROPOSAL] VACUUM Progress Checker.
Date: 2015-12-10 06:28:14
Message-ID: CAB7nPqRNw=w4mt-W+gtq0ED0KTR=B8Qgu6D+4BN3XmzFRuAgXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 10, 2015 at 4:40 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Dec 7, 2015 at 2:41 AM, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Hm, I guess progress messages would not change that much over the course
>> of a long-running command. How about we pass only the array(s) that we
>> change and NULL for others:
>>
>> With the following prototype for pgstat_report_progress:
>>
>> void pgstat_report_progress(uint32 *uint32_param, int num_uint32_param,
>> bool *message_param[], int num_message_param);
>>
>> If we just wanted to change, say scanned_heap_pages, then we report that
>> using:
>>
>> uint32_param[1] = scanned_heap_pages;
>> pgstat_report_progress(uint32_param, 3, NULL, 0);
>>
>> Also, pgstat_report_progress() should check each of its parameters for
>> NULL before iterating over to copy. So in most reports over the course of
>> a command, the message_param would be NULL and hence not copied.
>
> It's going to be *really* important that this facility provides a
> lightweight way of updating progress, so I think this whole API is
> badly designed. VACUUM, for example, is going to want a way to update
> the individual counter for the number of pages it's scanned after
> every page. It should not have to pass all of the other information
> that is part of a complete report. It should just be able to say
> pgstat_report_progress_update_counter(1, pages_scanned) or something
> of this sort. Don't marshal all of the data and then make
> pgstat_report_progress figure out what's changed. Provide a series of
> narrow APIs where the caller tells you exactly what they want to
> update, and you update only that. It's fine to have a
> pgstat_report_progress() function to update everything at once, for
> the use at the beginning of a command, but the incremental updates
> within the command should do something lighter-weight.

[first time looking really at the patch and catching up with this thread]

Agreed. As far as I can guess from the code, those should be as
followed to bloat pgstat message queue a minimum:

+ pgstat_report_activity_flag(ACTIVITY_IS_VACUUM);
/*
* Loop to process each selected relation.
*/
Defining a new routine for this purpose is a bit surprising. Cannot we
just use pgstat_report_activity with a new BackendState STATE_INVACUUM
or similar if we pursue the progress tracking approach?

A couple of comments:
- The relation OID should be reported and not its name. In case of a
relation rename that would not be cool for tracking, and most users
are surely going to join with other system tables using it.
- The progress tracking facility adds a whole level of complexity for
very little gain, and IMO this should *not* be part of PgBackendStatus
because in most cases its data finishes wasted. We don't expect
backends to run frequently such progress reports, do we? My opinion on
the matter if that we should define a different collector data for
vacuum, with something like PgStat_StatVacuumEntry, then have on top
of it a couple of routines dedicated at feeding up data with it when
some work is done on a vacuum job.

In short, it seems to me that this patch needs a rework, and should be
returned with feedback. Other opinions?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2015-12-10 06:48:26 Re: Speedup twophase transactions
Previous Message Amit Kapila 2015-12-10 06:26:47 Re: Move PinBuffer and UnpinBuffer to atomics