Re: Block level parallel vacuum WIP

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Block level parallel vacuum WIP
Date: 2017-01-06 17:38:28
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Mon, Oct 3, 2016 at 11:00 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> Yeah, I don't have a good solution for this problem so far.
>> We might need to improve group locking mechanism for the updating
>> operation or came up with another approach to resolve this problem.
>> For example, one possible idea is that the launcher process allocates
>> vm and fsm enough in advance in order to avoid extending fork relation
>> by parallel workers, but it's not resolve fundamental problem.

I got some advices at PGConf.ASIA 2016 and started to work on this again.

The most big problem so far is the group locking. As I mentioned
before, parallel vacuum worker could try to extend the same visibility
map page at the same time. So we need to make group locking conflict
in some cases, or need to eliminate the necessity of acquiring
extension lock. Attached 000 patch uses former idea, which makes the
group locking conflict between parallel workers when parallel worker
tries to acquire extension lock on same page. I'm not sure this is the
best idea but it's very simple and enough to support parallel vacuum.
More smart idea could be needed when we want to support parallel DML
and so on.

001 patch adds parallel option to VACUUM command. As Robert suggested
before, parallel option is set with parallel degree.

=# VACUUM (PARALLEL 4) table_name;

..means 4 background processes are launched and background process
executes lazy_scan_heap while the launcher (leader) process waits for
all vacuum workers finish. If N = 1 or without parallel option, leader
process itself executes lazy_scan_heap.

Internal Design
I changed the parallel vacuum internal design. Collecting garbage on
table is processed in block-level parallel. For tables with indexes,
each index on table is assigned to each vacuum worker and all garbage
on a index are processed by particular assigned vacuum worker.

The all spaces for the array of dead tuple TIDs used by vacuum worker
are allocated in dynamic shared memory by launcher process. Vacuum
worker process stores dead tuple location into its dead tuple array
without lock, the TIDs in a dead tuple array are ordered by TID. Note
that entire space for dead tuple, that is a bunch of dead tuple array,
are not ordered.

If table has index, all dead tuple TIDs needs to be shared with all
vacuum workers before actual reclaiming dead tuples starts and these
data should be cleared after all vacuum worker finished to use them.
So I put two synchronization points at where before reclaiming dead
tuples and where after finished to reclaim them. At these points,
parallel vacuum worker waits for all other workers to reach to the
same point. Once all vacuum workers reached to same point, vacuum
worker resumes next operation.

For example, If a table has five indexes and we execute parallel lazy
vacuum on that table with three vacuum workers, two of three vacuum
workers are assigned two indexes and another one vacuum worker is
assigned to one indexes. After the amount of dead tuple of all vacuum
worker reached to maintenance_work_mem size vacuum worker starts to
reclaim dead tuple on table and index. A vacuum worker that is
assigned to one index finishes (probably first) and sleeps until other
two vacuum workers finish to vacuum. If table has no index then each
parallel vacuum worker vacuums each page as we go.

I measured the execution time of vacuum on dirty table with several
parallel degree in my poor environment.

table_size | indexes | parallel_degree | time
6.5GB | 0 | 1 | 00:00:14
6.5GB | 0 | 2 | 00:00:02
6.5GB | 0 | 4 | 00:00:02
6.5GB | 1 | 1 | 00:00:13
6.5GB | 1 | 2 | 00:00:15
6.5GB | 1 | 4 | 00:00:18
6.5GB | 2 | 1 | 00:02:18
6.5GB | 2 | 2 | 00:00:38
6.5GB | 2 | 4 | 00:00:46
13GB | 0 | 1 | 00:03:52
13GB | 0 | 2 | 00:00:49
13GB | 0 | 4 | 00:00:50
13GB | 1 | 1 | 00:01:41
13GB | 1 | 2 | 00:01:59
13GB | 1 | 4 | 00:01:24
13GB | 2 | 1 | 00:12:42
13GB | 2 | 2 | 00:01:17
13GB | 2 | 4 | 00:02:12

In result of my measurement, vacuum execution time got better in some
cases but didn't improve in case where index = 1. I'll investigate the

* Vacuum progress support.
* Storage parameter support, perhaps parallel_vacuum_workers parameter
which allows autovacuum to use parallel vacuum on specified table.

I register this to next CF.


Masahiko Sawada
NTT Open Source Software Center

Attachment Content-Type Size
000_make_group_locking_conflict_extend_lock_v2.patch application/octet-stream 748 bytes
001_parallel_vacuum_v2.patch application/octet-stream 91.8 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-01-06 17:53:01 Placement of InvokeObjectPostAlterHook calls
Previous Message Robert Haas 2017-01-06 16:16:26 Re: Performance degradation in Bitmapscan (commit 75ae538bc3168bf44475240d4e0487ee2f3bb376)