Re: cost based vacuum (parallel)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: cost based vacuum (parallel)
Date: 2019-11-12 10:07:52
Message-ID: CAA4eK1JSe4tZPy8cVMkLG51O6KRcSPHc_C5tCiczus_DicJ=3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 12, 2019 at 3:03 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Tue, Nov 12, 2019 at 10:47 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Mon, Nov 11, 2019 at 4:23 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Nov 11, 2019 at 12:59 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, Nov 11, 2019 at 9:43 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Fri, Nov 8, 2019 at 11:49 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > >
> > > > > >
> > > > > > Yeah, I think it is difficult to get the exact balance, but we can try
> > > > > > to be as close as possible. We can try to play with the threshold and
> > > > > > another possibility is to try to sleep in proportion to the amount of
> > > > > > I/O done by the worker.
> > > > > I have done another experiment where I have done another 2 changes on
> > > > > top op patch3
> > > > > a) Only reduce the local balance from the total shared balance
> > > > > whenever it's applying delay
> > > > > b) Compute the delay based on the local balance.
> > > > >
> > > > > patch4:
> > > > > worker 0 delay=84.130000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > worker 1 delay=89.230000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > worker 2 delay=88.680000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > worker 3 delay=80.790000 total I/O=16378 hit=4318 miss=0 dirty=603
> > > > >
> > > > > I think with this approach the delay is divided among the worker quite
> > > > > well compared to other approaches
> > > > >
> > > > > >
> > > ..
> > > > I have tested the same with some other workload(test file attached).
> > > > I can see the same behaviour with this workload as well that with the
> > > > patch 4 the distribution of the delay is better compared to other
> > > > patches i.e. worker with more I/O have more delay and with equal IO
> > > > have alsomost equal delay. Only thing is that the total delay with
> > > > the patch 4 is slightly less compared to other pacthes.
> > > >
> > >
> > > I see one problem with the formula you have used in the patch, maybe
> > > that is causing the value of total delay to go down.
> > >
> > > - if (new_balance >= VacuumCostLimit)
> > > + VacuumCostBalanceLocal += VacuumCostBalance;
> > > + if ((new_balance >= VacuumCostLimit) &&
> > > + (VacuumCostBalanceLocal > VacuumCostLimit/(0.5 * nworker)))
> > >
> > > As per discussion, the second part of the condition should be
> > > "VacuumCostBalanceLocal > (0.5) * VacuumCostLimit/nworker". I think
> > > you can once change this and try again. Also, please try with the
> > > different values of threshold (0.3, 0.5, 0.7, etc.).
> > >
> > I have modified the patch4 and ran with different values. But, I
> > don't see much difference in the values with the patch4. Infact I
> > removed the condition for the local balancing check completely still
> > the delays are the same, I think this is because with patch4 worker
> > are only reducing their own balance and also delaying as much as their
> > local balance. So maybe the second condition will not have much
> > impact.
> >

Yeah, but I suspect the condition (when the local balance exceeds a
certain threshold, then only try to perform delay) you mentioned can
have an impact in some other scenarios. So, it is better to retain
the same. I feel the overall results look sane and the approach seems
reasonable to me.

> >
> I have revised the patch4 so that it doesn't depent upon the fix
> number of workers, instead I have dynamically updated the worker
> count.
>

Thanks. Sawada-San, by any chance, can you try some of the tests done
by Dilip or some similar tests just to rule out any sort of
machine-specific dependency?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-11-12 10:08:46 Re: [HACKERS] Block level parallel vacuum
Previous Message Mahendra Singh 2019-11-12 09:43:55 Re: [HACKERS] Block level parallel vacuum