Re: [HACKERS] Block level parallel vacuum

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Mahendra Singh <mahi6run(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-10-24 11:03:31
Message-ID: CAFiTN-u9D9hTQhzqxxUUCvk0hcWekyKftE8rZYbq1B6O58KA9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 24, 2019 at 4:21 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Oct 24, 2019 at 11:51 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > I am thinking if we can write the patch for both the approaches (a.
> > > > compute shared costs and try to delay based on that, b. try to divide
> > > > the I/O cost among workers as described in the email above[1]) and do
> > > > some tests to see the behavior of throttling, that might help us in
> > > > deciding what is the best strategy to solve this problem, if any.
> > > > What do you think?
> > >
> > > I agree with this idea. I can come up with a POC patch for approach
> > > (b). Meanwhile, if someone is interested to quickly hack with the
> > > approach (a) then we can do some testing and compare. Sawada-san,
> > > by any chance will you be interested to write POC with approach (a)?
> > > Otherwise, I will try to write it after finishing the first one
> > > (approach b).
> > >
> > I have come up with the POC for approach (a).
> >
>
> I think you mean to say approach (b).

Yeah, sorry for the confusion. It's approach (b).
>
> > The idea is
> > 1) Before launching the worker divide the current VacuumCostBalance
> > among workers so that workers start accumulating the balance from that
> > point.
> > 2) Also, divide the VacuumCostLimit among the workers.
> > 3) Once the worker are done with the index vacuum, send back the
> > remaining balance with the leader.
> > 4) The leader will sum all the balances and add that to its current
> > VacuumCostBalance. And start accumulating its balance from this
> > point.
> >
> > I was trying to test how is the behaviour of the vacuum I/O limit, but
> > I could not find an easy way to test that so I just put the tracepoint
> > in the code and just checked that at what point we are giving the
> > delay.
> > I also printed the cost balance at various point to see that after how
> > much I/O accumulation we are hitting the delay. Please feel free to
> > suggest a better way to test this.
> >
>
> Can we compute the overall throttling (sleep time) in the operation
> separately for heap and index, then divide the index's sleep_time with
> a number of workers and add it to heap's sleep time? Then, it will be
> a bit easier to compare the data between parallel and non-parallel
> case.

Okay, I will try to do that.
>
> > I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> > patch for dividing i/o limit (attached with the mail)
> >
> > Note: Patch and the test results are attached.
> >
>
> I think it is always a good idea to summarize the results and tell
> your conclusion about it. AFAICT, it seems to me this technique as
> done in patch might not work for the cases when there is an uneven
> amount of work done by parallel workers (say the index sizes vary
> (maybe due partial indexes or index column width or some other
> reasons)). The reason for it is that when the worker finishes it's
> work we don't rebalance the cost among other workers.
Right, thats one problem I observed.
Can we generate
> such a test and see how it behaves? I think it might be possible to
> address this if it turns out to be a problem.
Yeah, we can address this by rebalancing the cost.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Skjalg A. Skagen 2019-10-24 11:06:01 PostgreSQL 12 installation fails because locale name contained non-english characters
Previous Message Simon Riggs 2019-10-24 10:57:33 Re: Fix of fake unlogged LSN initialization