Re: [PROPOSAL] VACUUM Progress Checker.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: michael(dot)paquier(at)gmail(dot)com
Cc: robertmhaas(at)gmail(dot)com, Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp, alvherre(at)2ndquadrant(dot)com, rahilasyed90(at)gmail(dot)com, Jim(dot)Nasby(at)bluetreble(dot)com, pgsql-hackers(at)postgresql(dot)org, masao(dot)fujii(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, thom(at)linux(dot)com, pokurev(at)pm(dot)nttdata(dot)co(dot)jp, vinpokale(at)gmail(dot)com
Subject: Re: [PROPOSAL] VACUUM Progress Checker.
Date: 2015-12-10 07:30:05
Message-ID: 20151210.163005.29167515.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

At Thu, 10 Dec 2015 15:28:14 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqRNw=w4mt-W+gtq0ED0KTR=B8Qgu6D+4BN3XmzFRuAgXQ(at)mail(dot)gmail(dot)com>
> On Thu, Dec 10, 2015 at 4:40 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > On Mon, Dec 7, 2015 at 2:41 AM, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >> Hm, I guess progress messages would not change that much over the course
> >> of a long-running command. How about we pass only the array(s) that we
> >> change and NULL for others:
> >>
> >> With the following prototype for pgstat_report_progress:
> >>
> >> void pgstat_report_progress(uint32 *uint32_param, int num_uint32_param,
> >> bool *message_param[], int num_message_param);
> >>
> >> If we just wanted to change, say scanned_heap_pages, then we report that
> >> using:
> >>
> >> uint32_param[1] = scanned_heap_pages;
> >> pgstat_report_progress(uint32_param, 3, NULL, 0);
> >>
> >> Also, pgstat_report_progress() should check each of its parameters for
> >> NULL before iterating over to copy. So in most reports over the course of
> >> a command, the message_param would be NULL and hence not copied.
> >
> > It's going to be *really* important that this facility provides a
> > lightweight way of updating progress, so I think this whole API is
> > badly designed. VACUUM, for example, is going to want a way to update
> > the individual counter for the number of pages it's scanned after
> > every page. It should not have to pass all of the other information
> > that is part of a complete report. It should just be able to say
> > pgstat_report_progress_update_counter(1, pages_scanned) or something
> > of this sort. Don't marshal all of the data and then make
> > pgstat_report_progress figure out what's changed. Provide a series of
> > narrow APIs where the caller tells you exactly what they want to
> > update, and you update only that. It's fine to have a
> > pgstat_report_progress() function to update everything at once, for
> > the use at the beginning of a command, but the incremental updates
> > within the command should do something lighter-weight.
>
> [first time looking really at the patch and catching up with this thread]
>
> Agreed. As far as I can guess from the code, those should be as
> followed to bloat pgstat message queue a minimum:
>
> + pgstat_report_activity_flag(ACTIVITY_IS_VACUUM);
> /*
> * Loop to process each selected relation.
> */
> Defining a new routine for this purpose is a bit surprising. Cannot we
> just use pgstat_report_activity with a new BackendState STATE_INVACUUM
> or similar if we pursue the progress tracking approach?

The author might want to know vacuum status *after* it has been
finished. But it is reset just after the end of a vacuum. One
concern is that BackendState adds new value for
pg_stat_activiry.state and it might confuse someone using it but
I don't see other issue on it.

> A couple of comments:
> - The relation OID should be reported and not its name. In case of a
> relation rename that would not be cool for tracking, and most users
> are surely going to join with other system tables using it.

+1

> - The progress tracking facility adds a whole level of complexity for
> very little gain, and IMO this should *not* be part of PgBackendStatus
> because in most cases its data finishes wasted. We don't expect
> backends to run frequently such progress reports, do we?

I strongly thought the same thing but I haven't came up with
better place for it to be stored. but,

> My opinion on
> the matter if that we should define a different collector data for
> vacuum, with something like PgStat_StatVacuumEntry, then have on top
> of it a couple of routines dedicated at feeding up data with it when
> some work is done on a vacuum job.

+many. But I can't guess the appropriate number of the entry of
it, or suitable replacing policy on excesive number of
vacuums. Although sane users won't run vacuum on so many
backends.

> In short, it seems to me that this patch needs a rework, and should be
> returned with feedback. Other opinions?

This is important feature for DBAs so I agree with you.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-12-10 07:57:38 Re: Error with index on unlogged table
Previous Message Jeff Janes 2015-12-10 06:48:26 Re: Speedup twophase transactions