Re: Block level parallel vacuum WIP

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block level parallel vacuum WIP
Date: 2016-09-10 10:44:24
Message-ID: CABOikdP3_fRm9p69u6bZd8F=hvfawy8Z9wTjiapuDdv45=k36w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 24, 2016 at 3:31 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

> On Tue, Aug 23, 2016 at 10:50 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier
> > <michael(dot)paquier(at)gmail(dot)com> wrote:
> >> On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
> >>> As for PoC, I implemented parallel vacuum so that each worker
> >>> processes both 1 and 2 phases for particular block range.
> >>> Suppose we vacuum 1000 blocks table with 4 workers, each worker
> >>> processes 250 consecutive blocks in phase 1 and then reclaims dead
> >>> tuples from heap and indexes (phase 2).
> >>
> >> So each worker is assigned a range of blocks, and processes them in
> >> parallel? This does not sound performance-wise. I recall Robert and
> >> Amit emails on the matter for sequential scan that this would suck
> >> performance out particularly for rotating disks.
> >>
> >
> > The implementation in patch is same as we have initially thought for
> > sequential scan, but turned out that it is not good way to do because
> > it can lead to inappropriate balance of work among workers. Suppose
> > one worker is able to finish it's work, it won't be able to do more.
>
> Ah, so it was the reason. Thanks for confirming my doubts on what is
> proposed.
> --
>

I believe Sawada-san has got enough feedback on the design to work out the
next steps. It seems natural that the vacuum workers are assigned a portion
of the heap to scan and collect dead tuples (similar to what patch does)
and the same workers to be responsible for the second phase of heap scan.

But as far as index scans are concerned, I agree with Tom that the best
strategy is to assign a different index to each worker process and let them
vacuum indexes in parallel. That way the work for each worker process is
clearly cut out and they don't contend for the same resources, which means
the first patch to allow multiple backends to wait for a cleanup buffer is
not required. Later we could extend it further such multiple workers can
vacuum a single index by splitting the work on physical boundaries, but
even that will ensure clear demarkation of work and hence no contention on
index blocks.

ISTM this will require further work and it probably makes sense to mark the
patch as "Returned with feedback" for now.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2016-09-10 11:17:41 Re: pg_sequence catalog
Previous Message Craig Ringer 2016-09-10 10:36:49 Re: pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)