Re: [HACKERS] Block level parallel vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2018-12-18 07:59:05
Message-ID: CAD21AoA1T5AbvuJGCnoUr6BHnNe+Pzo-+nYWQApuzRbeJwTJxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Tue, Nov 27, 2018 at 11:26 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Nov 26, 2018 at 2:08 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Sun, Nov 25, 2018 at 2:35 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Sat, Nov 24, 2018 at 5:47 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > >
> > > >
> >
> > Thank you for the comment.
> >
> > > > I could see that you have put a lot of effort on this patch and still
> > > > we are not able to make much progress mainly I guess because of
> > > > relation extension lock problem. I think we can park that problem for
> > > > some time (as already we have invested quite some time on it), discuss
> > > > a bit about actual parallel vacuum patch and then come back to it.
> > > >
> > >
> > > Today, I was reading this and previous related thread [1] and it seems
> > > to me multiple people Andres [2], Simon [3] have pointed out that
> > > parallelization for index portion is more valuable. Also, some of the
> > > results [4] indicate the same. Now, when there are no indexes,
> > > parallelizing heap scans also have benefit, but I think in practice we
> > > will see more cases where the user wants to vacuum tables with
> > > indexes. So how about if we break this problem in the following way
> > > where each piece give the benefit of its own:
> > > (a) Parallelize index scans wherein the workers will be launched only
> > > to vacuum indexes. Only one worker per index will be spawned.
> > > (b) Parallelize per-index vacuum. Each index can be vacuumed by
> > > multiple workers.
> > > (c) Parallelize heap scans where multiple workers will scan the heap,
> > > collect dead TIDs and then launch multiple workers for indexes.
> > >
> > > I think if we break this problem into multiple patches, it will reduce
> > > the scope of each patch and help us in making progress. Now, it's
> > > been more than 2 years that we are trying to solve this problem, but
> > > still didn't make much progress. I understand there are various
> > > genuine reasons and all of that work will help us in solving all the
> > > problems in this area. How about if we first target problem (a) and
> > > once we are done with that we can see which of (b) or (c) we want to
> > > do first?
> >
> > Thank you for suggestion. It seems good to me. We would get a nice
> > performance scalability even by only (a), and vacuum will get more
> > powerful by (b) or (c). Also, (a) would not require to resovle the
> > relation extension lock issue IIUC.
> >
>
> Yes, I also think so. We do acquire 'relation extension lock' during
> index vacuum, but as part of (a), we are talking one worker per-index,
> so there shouldn't be a problem with respect to deadlocks.
>
> > I'll change the patch and submit
> > to the next CF.
> >
>
> Okay.
>

Attached the updated patches. I scaled back the scope of this patch.
The patch now includes only feature (a), that is it execute both index
vacuum and cleanup index in parallel. It also doesn't include
autovacuum support for now.

The PARALLEL option works alomst same as before patch. In VACUUM
command, we can specify 'PARALLEL n' option where n is the number of
parallel workers to request. If the n is omitted the number of
parallel worekrs would be # of indexes -1. Also we can specify
parallel degree by parallel_worker reloption. The number or parallel
workers is capped by Min(# of indexes - 1,
max_maintenance_parallel_workers). That is, parallel vacuum can be
executed for a table if it has more than one indexes.

For internal design, the details are written at the top of comment in
vacuumlazy.c file. In parallel vacuum mode, we allocate DSM at the
beginning of lazy vacuum which stores shared information as well as
dead tuples. When starting either index vacuum or cleanup vacuum we
launch parallel workers. The parallel workers perform either index
vacuum or clenaup vacuum for each indexes, and then exit after done
all indexes. Then the leader process re-initialize DSM and re-launch
at the next time, not destroy parallel context here. After done lazy
vacuum, the leader process exits the parallel mode and updates index
statistics since we are not allowed any writes during parallel mode.

Also I've attached 0002 patch to support parallel lazy vacuum for
vacuumdb command.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
v9-0002-Add-P-option-to-vacuumdb-command.patch application/x-patch 5.8 KB
v9-0001-Add-parallel-option-to-VACUUM-command.patch application/x-patch 70.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2018-12-18 08:13:40 Re: psql exit status with multiple -c or -f
Previous Message Amit Langote 2018-12-18 07:49:07 Re: ToDo: show size of partitioned table