Re: [HACKERS] Block level parallel vacuum

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-01-22 12:59:33
Message-ID: CAJrrPGe_11PeypnMpWrsZh4W_+yMdwWWJKstBBW0jL=sDm2KfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 18, 2019 at 11:42 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

> On Fri, Jan 18, 2019 at 10:38 AM Haribabu Kommi
> <kommi(dot)haribabu(at)gmail(dot)com> wrote:
> >
> >
> > On Tue, Jan 15, 2019 at 6:00 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
> >>
> >>
> >> Rebased.
> >
> >
> > I started reviewing the patch, I didn't finish my review yet.
> > Following are some of the comments.
>
> Thank you for reviewing the patch.
>
> >
> > + <term><literal>PARALLEL <replaceable
> class="parameter">N</replaceable></literal></term>
> > + <listitem>
> > + <para>
> > + Execute index vacuum and cleanup index in parallel with
> >
> > I doubt that user can understand the terms index vacuum and cleanup
> index.
> > May be it needs some more detailed information.
> >
>
> Agreed. Table 27.22 "Vacuum phases" has a good description of vacuum
> phases. So maybe adding the referencint to it would work.
>

OK.

> >
> > -typedef enum VacuumOption
> > +typedef enum VacuumOptionFlag
> > {
> >
> > I don't find the new name quite good, how about VacuumFlags?
> >
>
> Agreed with removing "Option" from the name but I think VacuumFlag
> would be better because this enum represents only one flag. Thoughts?
>

OK.

>
> > postgres=# vacuum (parallel 2, verbose) tbl;
> >
> > With verbose, no parallel workers related information is available.
> > I feel giving that information is required even when it is not parallel
> > vacuum also.
> >
>
> Agreed. How about the folloiwng verbose output? I've added the number
> of launched, planned and requested vacuum workers and purpose (vacuum
> or cleanup).
>
> postgres(1:91536)=# vacuum (verbose, parallel 30) test; -- table
> 'test' has 3 indexes
> INFO: vacuuming "public.test"
> INFO: launched 2 parallel vacuum workers for index vacuum (planned:
> 2, requested: 30)
> INFO: scanned index "test_idx1" to remove 2000 row versions
> DETAIL: CPU: user: 0.12 s, system: 0.00 s, elapsed: 0.12 s
> INFO: scanned index "test_idx2" to remove 2000 row versions by
> parallel vacuum worker
> DETAIL: CPU: user: 0.07 s, system: 0.05 s, elapsed: 0.12 s
> INFO: scanned index "test_idx3" to remove 2000 row versions by
> parallel vacuum worker
> DETAIL: CPU: user: 0.09 s, system: 0.05 s, elapsed: 0.14 s
> INFO: "test": removed 2000 row versions in 10 pages
> DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> INFO: launched 2 parallel vacuum workers for index cleanup (planned:
> 2, requested: 30)
> INFO: index "test_idx1" now contains 991151 row versions in 2745 pages
> DETAIL: 2000 index row versions were removed.
> 24 index pages have been deleted, 18 are currently reusable.
> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
> INFO: index "test_idx2" now contains 991151 row versions in 2745 pages
> DETAIL: 2000 index row versions were removed.
> 24 index pages have been deleted, 18 are currently reusable.
> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
> INFO: index "test_idx3" now contains 991151 row versions in 2745 pages
> DETAIL: 2000 index row versions were removed.
> 24 index pages have been deleted, 18 are currently reusable.
> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
> INFO: "test": found 2000 removable, 367 nonremovable row versions in
> 41 out of 4425 pages
> DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 500
> There were 6849 unused item pointers.
> Skipped 0 pages due to buffer pins, 0 frozen pages.
> 0 pages are entirely empty.
> CPU: user: 0.12 s, system: 0.01 s, elapsed: 0.17 s.
> VACUUM
>

The verbose output is good.

Since the previous patch conflicts with 285d8e12 I've attached the
> latest version patch that incorporated the review comment I got.
>

Thanks for the latest patch. I have some more minor comments.

+ Execute index vacuum and cleanup index in parallel with

Better to use vacuum index and cleanup index? This is in same with
the description of vacuum phases. It is better to follow same notation
in the patch.

+ dead_tuples = lazy_space_alloc(lvstate, nblocks, parallel_workers);

With the change, the lazy_space_alloc takes care of initializing the
parallel vacuum, can we write something related to that in the comments.

+ initprog_val[2] = dead_tuples->max_dead_tuples;

dead_tuples variable may need rename for better reading?

+ if (lvshared->indstats[idx].updated)
+ result = &(lvshared->indstats[idx].stats);
+ else
+ copy_result = true;

I don't see a need for copy_result variable, how about directly using
the updated flag to decide whether to copy or not? Once the result is
copied update the flag.

+use Test::More tests => 34;

I don't find any new tetst are added in this patch.

I am thinking of performance penalty if we use the parallel option of
vacuum on a small sized table?

Regards,
Haribabu Kommi
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Surafel Temesgen 2019-01-22 13:13:02 Re: pg_dump multi VALUES INSERT
Previous Message Fabien COELHO 2019-01-22 12:43:18 Re: Alternative to \copy in psql modelled after \g