Re: [HACKERS] Block level parallel vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Mahendra Singh <mahi6run(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Block level parallel vacuum
Date: 2019-10-24 14:41:44
Message-ID: CAD21AoBY4W2aTye4zG2cAaBeT0TqknmTJuij=fXPXz4MFGiF-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 24, 2019 at 3:21 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Fri, Oct 18, 2019 at 12:18 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Fri, Oct 18, 2019 at 11:25 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Oct 18, 2019 at 8:45 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > >
> > > > On Thu, Oct 17, 2019 at 4:00 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Thu, Oct 17, 2019 at 3:25 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > On Thu, Oct 17, 2019 at 2:12 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 17, 2019 at 5:30 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > > > >
> > > > > > > > Another point in this regard is that the user anyway has an option to
> > > > > > > > turn off the cost-based vacuum. By default, it is anyway disabled.
> > > > > > > > So, if the user enables it we have to provide some sensible behavior.
> > > > > > > > If we can't come up with anything, then, in the end, we might want to
> > > > > > > > turn it off for a parallel vacuum and mention the same in docs, but I
> > > > > > > > think we should try to come up with a solution for it.
> > > > > > >
> > > > > > > I finally got your point and now understood the need. And the idea I
> > > > > > > proposed doesn't work fine.
> > > > > > >
> > > > > > > So you meant that all workers share the cost count and if a parallel
> > > > > > > vacuum worker increase the cost and it reaches the limit, does the
> > > > > > > only one worker sleep? Is that okay even though other parallel workers
> > > > > > > are still running and then the sleep might not help?
> > > > > > >
> > > > >
> > > > > Remember that the other running workers will also increase
> > > > > VacuumCostBalance and whichever worker finds that it becomes greater
> > > > > than VacuumCostLimit will reset its value and sleep. So, won't this
> > > > > make sure that overall throttling works the same?
> > > > >
> > > > > > I agree with this point. There is a possibility that some of the
> > > > > > workers who are doing heavy I/O continue to work and OTOH other
> > > > > > workers who are doing very less I/O might become the victim and
> > > > > > unnecessarily delay its operation.
> > > > > >
> > > > >
> > > > > Sure, but will it impact the overall I/O? I mean to say the rate
> > > > > limit we want to provide for overall vacuum operation will still be
> > > > > the same. Also, isn't a similar thing happens now also where heap
> > > > > might have done a major portion of I/O but soon after we start
> > > > > vacuuming the index, we will hit the limit and will sleep.
> > > >
> > > > Actually, What I meant is that the worker who performing actual I/O
> > > > might not go for the delay and another worker which has done only CPU
> > > > operation might pay the penalty? So basically the worker who is doing
> > > > CPU intensive operation might go for the delay and pay the penalty and
> > > > the worker who is performing actual I/O continues to work and do
> > > > further I/O. Do you think this is not a practical problem?
> > > >
> > >
> > > I don't know. Generally, we try to delay (if required) before
> > > processing (read/write) one page which means it will happen for I/O
> > > intensive operations, so I am not sure if the point you are making is
> > > completely correct.
> >
> > Ok, I agree with the point that we are checking it only when we are
> > doing the I/O operation. But, we also need to consider that each I/O
> > operations have a different weightage. So even if we have a delay
> > point at I/O operation there is a possibility that we might delay the
> > worker which is just performing read buffer with page
> > hit(VacuumCostPageHit). But, the other worker who is actually
> > dirtying the page(VacuumCostPageDirty = 20) continue the work and do
> > more I/O.
> >
> > >
> > > > Stepping back a bit, OTOH, I think that we can not guarantee that the
> > > > one worker who has done more I/O will continue to do further I/O and
> > > > the one which has not done much I/O will not perform more I/O in
> > > > future. So it might not be too bad if we compute shared costs as you
> > > > suggested above.
> > > >
> > >
> > > I am thinking if we can write the patch for both the approaches (a.
> > > compute shared costs and try to delay based on that, b. try to divide
> > > the I/O cost among workers as described in the email above[1]) and do
> > > some tests to see the behavior of throttling, that might help us in
> > > deciding what is the best strategy to solve this problem, if any.
> > > What do you think?
> >
> > I agree with this idea. I can come up with a POC patch for approach
> > (b). Meanwhile, if someone is interested to quickly hack with the
> > approach (a) then we can do some testing and compare. Sawada-san,
> > by any chance will you be interested to write POC with approach (a)?
> > Otherwise, I will try to write it after finishing the first one
> > (approach b).
> >
> I have come up with the POC for approach (a).
>
> The idea is
> 1) Before launching the worker divide the current VacuumCostBalance
> among workers so that workers start accumulating the balance from that
> point.
> 2) Also, divide the VacuumCostLimit among the workers.
> 3) Once the worker are done with the index vacuum, send back the
> remaining balance with the leader.
> 4) The leader will sum all the balances and add that to its current
> VacuumCostBalance. And start accumulating its balance from this
> point.
>
> I was trying to test how is the behaviour of the vacuum I/O limit, but
> I could not find an easy way to test that so I just put the tracepoint
> in the code and just checked that at what point we are giving the
> delay.
> I also printed the cost balance at various point to see that after how
> much I/O accumulation we are hitting the delay. Please feel free to
> suggest a better way to test this.
>
> I have printed these logs for parallel vacuum patch (v30) vs v(30) +
> patch for dividing i/o limit (attached with the mail)
>
> Note: Patch and the test results are attached.
>

Thank you!

For approach (a) the basic idea I've come up with is that we have a
shared balance value on DSM and each workers including the leader
process add its local balance value to it in vacuum_delay_point, and
then based on the shared value workers sleep. I'll submit that patch
with other updates.

Regards,

--
Masahiko Sawada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2019-10-24 14:51:26 Re: dropdb --force
Previous Message Amit Kapila 2019-10-24 13:37:12 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions