Re: maintenance_work_mem used by Vacuum

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: maintenance_work_mem used by Vacuum
Date: 2019-10-08 05:59:56
Message-ID: CAA4eK1JhpNsTiHj+JOy3N8uCGyTBMH8xDhUEtBw8ZeCAPRGp6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 8, 2019 at 12:57 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Sun, Oct 6, 2019 at 6:55 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > We wanted to decide how a parallel vacuum should use memory? Can each worker consume maintenance_work_mem to clean up the gin Index or all workers should use no more than maintenance_work_mem? We were thinking of later but before we decide what is the right behavior for parallel vacuum, I thought it is better to once discuss if the current memory usage model is right.
>
> Well, I had the idea when we were developing parallel query that we
> should just ignore the problem of work_mem: every node can use X
> amount of work_mem, and if there are multiple copies of the node in
> multiple processes, then you probably end up using more memory. I
> have been informed by Thomas Munro -- in very polite terminology --
> that this was a terrible decision which is causing all kinds of
> problems for users. I haven't actually encountered that situation
> myself, but I don't doubt that it's an issue.
>
> I think it's a lot easier to do better when we're talking about
> maintenance commands rather than queries. Maintenance operations
> typically don't have the problem that queries do with an unknown
> number of nodes using memory; you typically know all of your memory
> needs up front. So it's easier to budget that out across workers or
> whatever. It's a little harder in this case, because you could have
> any number of GIN indexes (1 to infinity) and the amount of memory you
> can use depends on not only on how many of them there are but,
> presumably also, the number of those that are going to be vacuumed at
> the same time. So you might have 8 indexes, 3 workers, and 2 of the
> indexes are GIN. In that case, you know that you can't have more than
> 2 GIN indexes being processed at the same time, but it's likely to be
> only one, and maybe with proper scheduling you could make it sure it's
> only one. On the other hand, if you dole out the memory assuming it's
> only 1, what happens if you start that one, then process all 6 of the
> non-GIN indexes, and that one isn't done yet. I guess you could wait
> to start cleanup on the other GIN indexes until the previous index
> cleanup finishes, but that kinda sucks too. So I'm not really sure how
> to handle this particular case. I think the principle of dividing up
> the memory rather than just using more is probably a good one, but
> figuring out exactly how that should work seems tricky.
>

Yeah and what if we have workers equal to indexes, so doing the clean
up of Gin indexes serially (wait for the prior index to finish before
starting the clean up of next Gin index) in that case would be bad
too. I think we can do something simple like choose minimum among
'number of Gin Indexes', 'number of workers requested for parallel
vacuum' and 'number of max_parallel_maintenance_workers' and then
divide the maintenance_work_mem by that to get the memory used by each
of the Gin indexes. I think it has some caveats like we might not be
able to launch the number of workers we decided and in that case we
probably could have computed bigger value of work_mem that can be used
by Gin indexes. I think whatever we pick here can be good for some
cases and not-so-good for others, so why not pick something general
and simple.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2019-10-08 06:03:40 Re: Updated some links which are not working with new links
Previous Message Amit Kapila 2019-10-08 05:44:56 Re: maintenance_work_mem used by Vacuum