Re: maintenance_work_mem used by Vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: maintenance_work_mem used by Vacuum
Date: 2019-10-10 04:28:17
Message-ID: CAD21AoCxV2RzoOok_=U2pd0HgQjX74mVvWcUEu08PFT0eJFU_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 9, 2019 at 7:12 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Wed, Oct 9, 2019 at 2:40 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Oct 9, 2019 at 2:00 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Oct 9, 2019 at 10:22 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Tue, Oct 8, 2019 at 2:45 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Tue, Oct 8, 2019 at 1:48 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > > > > >
> > > > > > ISTM that the use of maintenance_work_mem wasn't given that much
> > > > > > thought originally.
> > > > > >
> > > > >
> > > > > One idea to something better could be to check, if there is a GIN
> > > > > index on a table, then use 1/4 (25% or whatever) of
> > > > > maintenance_work_mem for GIN indexes and 3/4 (75%) of
> > > > > maintenance_work_mem for collection dead tuples.
> > > > >
> > > >
> > > > I felt that it would not be easy for users to tune
> > > > maintenance_work_mem which controls more than one things. If this is
> > > > an index AM(GIN) specific issue we might rather want to control the
> > > > memory limit of pending list cleanup by a separate GUC parameter like
> > > > gin_pending_list_limit, say gin_pending_list_work_mem.
> >
> > Sure, by having another work_mem parameter for the Gin indexes which
> > controls when we need to flush the pending list will make life easier
> > as a programmer. I think if we have a specific parameter for this
> > purpose, then we can even think of using the same for a clean up
> > during insert operation as well. However, I am not sure how easy it
> > would be for users? Basically, now they need to remember another
> > parameter and for which there is no easy way to know what should be
> > the value. I think one has to check
> > gin_metapage_info->n_pending_pages and then based on that they can
> > configure the value for this parameter to get the maximum benefit
> > possible.
> >
> > Can we think of using work_mem for this? Basically, we use work_mem
> > during insert operation, so why not use it during vacuum operation for
> > this purpose?
> >
> > Another idea could be to try to divide the maintenance_work_mem
> > smartly if we know the value of pending_pages for each Gin index, but
> > I think for that we need to either read the metapage of maybe use some
> > sort of stats which can be used by vacuum. We need to somehow divide
> > it based on the amount of memory required for a number of dead tuples
> > in heap and memory required by tuples in the pending list. I am not
> > sure how feasible is this approach.
> >
> > About difficulty for users tuning one or two parameters for vacuum, I
> > think if they can compute what could be the values for Guc's
> > separately, then why can't they add up and set it as one value.
> > Having said that, I am not denying that having a separate parameter
> > gives better control, and for this specific case using separate
> > parameter can allow us to use it both during vacuum and insert
> > operations.
> >
> > > > And we can
> > > > either set the (the memory for GIN pending list cleanup / # of GIN
> > > > indexes) to the parallel workers.
> > > >
> > > IMHO if we do that then we will loose the meaning of having
> > > maintenance_work_mem right? Then user can not control that how much
> > > memory the autovacuum worker will use.
> > >
> >
> > I am not sure how different it is from the current situation?
> > Basically, now it can use up to 2 * maintenance_work_mem memory and if
> > we do what Sawada-San is proposing, then it will be
> > maintenance_work_mem + gin_*_work_mem. Do you have some other
> > alternative idea in mind or you think the current situation is better
> > than anything else we can do in this area?
>
> I think the current situation is not good but if we try to cap it to
> maintenance_work_mem + gin_*_work_mem then also I don't think it will
> make the situation much better. However, I think the idea you
> proposed up-thread[1] is better. At least the maintenance_work_mem
> will be the top limit what the auto vacuum worker can use.
>

I'm concerned that there are other index AMs that could consume more
memory like GIN. In principle we can vacuum third party index AMs and
will be able to even parallel vacuum them. I expect that
maintenance_work_mem is the top limit of memory usage of maintenance
command but actually it's hard to set the limit to memory usage of
bulkdelete and cleanup by the core. So I thought that since GIN is the
one of the index AM it can have a new parameter to make its job
faster. If we have that parameter it might not make the current
situation much better but user will be able to set a lower value to
that parameter to not use the memory much while keeping the number of
index vacuums.

Regards,

--
Masahiko Sawada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message btendouan 2019-10-10 04:30:54 Re: pgbench - extend initialization phase control
Previous Message Michael Paquier 2019-10-10 04:13:37 Re: dropping column prevented due to inherited index