Re: maintenance_work_mem used by Vacuum

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: maintenance_work_mem used by Vacuum
Date: 2019-10-09 10:12:18
Message-ID: CAFiTN-vm28tfbMur2kYH3JKTs0WdfyabUgM2RJVNJ13WYJ_TfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 9, 2019 at 2:40 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Oct 9, 2019 at 2:00 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Wed, Oct 9, 2019 at 10:22 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Oct 8, 2019 at 2:45 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > On Tue, Oct 8, 2019 at 1:48 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > > > >
> > > > > ISTM that the use of maintenance_work_mem wasn't given that much
> > > > > thought originally.
> > > > >
> > > >
> > > > One idea to something better could be to check, if there is a GIN
> > > > index on a table, then use 1/4 (25% or whatever) of
> > > > maintenance_work_mem for GIN indexes and 3/4 (75%) of
> > > > maintenance_work_mem for collection dead tuples.
> > > >
> > >
> > > I felt that it would not be easy for users to tune
> > > maintenance_work_mem which controls more than one things. If this is
> > > an index AM(GIN) specific issue we might rather want to control the
> > > memory limit of pending list cleanup by a separate GUC parameter like
> > > gin_pending_list_limit, say gin_pending_list_work_mem.
>
> Sure, by having another work_mem parameter for the Gin indexes which
> controls when we need to flush the pending list will make life easier
> as a programmer. I think if we have a specific parameter for this
> purpose, then we can even think of using the same for a clean up
> during insert operation as well. However, I am not sure how easy it
> would be for users? Basically, now they need to remember another
> parameter and for which there is no easy way to know what should be
> the value. I think one has to check
> gin_metapage_info->n_pending_pages and then based on that they can
> configure the value for this parameter to get the maximum benefit
> possible.
>
> Can we think of using work_mem for this? Basically, we use work_mem
> during insert operation, so why not use it during vacuum operation for
> this purpose?
>
> Another idea could be to try to divide the maintenance_work_mem
> smartly if we know the value of pending_pages for each Gin index, but
> I think for that we need to either read the metapage of maybe use some
> sort of stats which can be used by vacuum. We need to somehow divide
> it based on the amount of memory required for a number of dead tuples
> in heap and memory required by tuples in the pending list. I am not
> sure how feasible is this approach.
>
> About difficulty for users tuning one or two parameters for vacuum, I
> think if they can compute what could be the values for Guc's
> separately, then why can't they add up and set it as one value.
> Having said that, I am not denying that having a separate parameter
> gives better control, and for this specific case using separate
> parameter can allow us to use it both during vacuum and insert
> operations.
>
> > > And we can
> > > either set the (the memory for GIN pending list cleanup / # of GIN
> > > indexes) to the parallel workers.
> > >
> > IMHO if we do that then we will loose the meaning of having
> > maintenance_work_mem right? Then user can not control that how much
> > memory the autovacuum worker will use.
> >
>
> I am not sure how different it is from the current situation?
> Basically, now it can use up to 2 * maintenance_work_mem memory and if
> we do what Sawada-San is proposing, then it will be
> maintenance_work_mem + gin_*_work_mem. Do you have some other
> alternative idea in mind or you think the current situation is better
> than anything else we can do in this area?

I think the current situation is not good but if we try to cap it to
maintenance_work_mem + gin_*_work_mem then also I don't think it will
make the situation much better. However, I think the idea you
proposed up-thread[1] is better. At least the maintenance_work_mem
will be the top limit what the auto vacuum worker can use.

[1] https://www.postgresql.org/message-id/CAA4eK1JhY88BXC%3DZK%3D89MALm%2BLyMkMhi6WG6AZfE4%2BKij6mebg%40mail.gmail.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-10-09 11:50:33 Re: Standby accepts recovery_target_timeline setting?
Previous Message Alvaro Herrera 2019-10-09 09:36:35 Re: dropping column prevented due to inherited index