Re: [HACKERS] Block level parallel vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-10-04 14:03:40
Message-ID: CAD21AoARj=e=6_KOZnaR66jRkDmGaVdLcrt33Ua-zMUugKU3mQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 4, 2019 at 2:02 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Oct 2, 2019 at 7:29 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>
>> On Sat, Sep 21, 2019 at 9:31 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> > *
>> > In function compute_parallel_workers, don't we want to cap the number
>> > of workers based on maintenance_work_mem as we do in
>> > plan_create_index_workers?
>> >
>> > The basic point is how do we want to treat maintenance_work_mem for
>> > this feature. Do we want all workers to use at max the
>> > maintenance_work_mem or each worker is allowed to use
>> > maintenance_work_mem? I would prefer earlier unless we have good
>> > reason to follow the later strategy.
>> >
>> > Accordingly, we might need to update the below paragraph in docs:
>> > "Note that parallel utility commands should not consume substantially
>> > more memory than equivalent non-parallel operations. This strategy
>> > differs from that of parallel query, where resource limits generally
>> > apply per worker process. Parallel utility commands treat the
>> > resource limit <varname>maintenance_work_mem</varname> as a limit to
>> > be applied to the entire utility command, regardless of the number of
>> > parallel worker processes."
>>
>> I'd also prefer to use maintenance_work_mem at max during parallel
>> vacuum regardless of the number of parallel workers. This is current
>> implementation. In lazy vacuum the maintenance_work_mem is used to
>> record itempointer of dead tuples. This is done by leader process and
>> worker processes just refers them for vacuuming dead index tuples.
>> Even if user sets a small amount of maintenance_work_mem the parallel
>> vacuum would be helpful as it still would take a time for index
>> vacuuming. So I thought we should cap the number of parallel workers
>> by the number of indexes rather than maintenance_work_mem.
>>
>
> Isn't that true only if we never use maintenance_work_mem during index cleanup? However, I think we are using during index cleanup, see forex. ginInsertCleanup. I think before reaching any conclusion about what to do about this, first we need to establish whether this is a problem. If I am correct, then only some of the index cleanups (like gin index) use maintenance_work_mem, so we need to consider that point while designing a solution for this.
>

I got your point. Currently the single process lazy vacuum could
consume the amount of (maintenance_work_mem * 2) memory at max because
we do index cleanup during holding the dead tuple space as you
mentioned. And ginInsertCleanup is also be called at the beginning of
ginbulkdelete. In current parallel lazy vacuum, each parallel vacuum
worker could consume other memory apart from the memory used by heap
scan depending on the implementation of target index AM. Given that
the current single and parallel vacuum implementation it would be
better to control the amount memory in total rather than the number of
parallel workers. So one approach I came up with is that we make all
vacuum workers use the amount of (maintenance_work_mem / # of
participants) as new maintenance_work_mem. It might be too small in
some cases but it doesn't consume more memory than single lazy vacuum
as long as index AM doesn't consume more memory regardless of
maintenance_work_mem. I think it really depends on the implementation
of index AM.

>>
>> > *
>> > + keys++;
>> > +
>> > + /* Estimate size for dead tuples -- PARALLEL_VACUUM_KEY_DEAD_TUPLES */
>> > + maxtuples = compute_max_dead_tuples(nblocks, true);
>> > + est_deadtuples = MAXALIGN(add_size(sizeof(LVDeadTuples),
>> > + mul_size(sizeof(ItemPointerData), maxtuples)));
>> > + shm_toc_estimate_chunk(&pcxt->estimator, est_deadtuples);
>> > + keys++;
>> > +
>> > + shm_toc_estimate_keys(&pcxt->estimator, keys);
>> > +
>> > + /* Finally, estimate VACUUM_KEY_QUERY_TEXT space */
>> > + querylen = strlen(debug_query_string);
>> > + shm_toc_estimate_chunk(&pcxt->estimator, querylen + 1);
>> > + shm_toc_estimate_keys(&pcxt->estimator, 1);
>> >
>> > The code style looks inconsistent here. In some cases, you are
>> > calling shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
>> > and in other cases, you are accumulating keys. I think it is better
>> > to call shm_toc_estimate_keys immediately after shm_toc_estimate_chunk
>> > in all cases.
>>
>> Fixed. But there are some code that call shm_toc_estimate_keys for
>> multiple keys in for example nbtsort.c and parallel.c. What is the
>> difference?
>>
>
> We can do it, either way, depending on the situation. For example, in nbtsort.c, there is an if check based on which 'number of keys' can vary. I think here we should try to write in a way that it should not confuse the reader why it is done in a particular way. This is the reason I told you to be consistent.

Understood. Thank you for explanation!

>
>>
>> >
>> > *
>> > +void
>> > +heap_parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
>> > +{
>> > ..
>> > + /* Open table */
>> > + onerel = heap_open(lvshared->relid, ShareUpdateExclusiveLock);
>> > ..
>> > }
>> >
>> > I don't think it is a good idea to assume the lock mode as
>> > ShareUpdateExclusiveLock here. Tomorrow, if due to some reason there
>> > is a change in lock level for the vacuum process, we might forget to
>> > update it here. I think it is better if we can get this information
>> > from the master backend.
>>
>> So did you mean to declare the lock mode for lazy vacuum somewhere as
>> a global variable and use it in both try_relation_open in the leader
>> process and relation_open in the worker process? Otherwise we would
>> end up with adding something like shared->lmode =
>> ShareUpdateExclusiveLock during parallel context initialization, which
>> seems not to resolve your concern.
>>
>
> I was thinking that if we can find a way to pass the lockmode we used in vacuum_rel, but I guess we need to pass it through multiple functions which will be a bit inconvenient. OTOH, today, I checked nbtsort.c (_bt_parallel_build_main) and found that there also we are using it directly instead of passing it from the master backend. I think we can leave it as you have in the patch, but add a comment on why it is okay to use that lock mode?

Yeah agreed.

Regards,

--
Masahiko Sawada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexey Kondratov 2019-10-04 14:21:25 Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)
Previous Message Robert Haas 2019-10-04 13:23:26 Re: Close stdout and stderr in syslogger