Re: cost based vacuum (parallel)

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: cost based vacuum (parallel)
Date: 2019-11-13 04:32:13
Message-ID: CA+fd4k5T5sJcoBtO02mVy=mBV4hzZwv5OVrn9jwtUAcc0Qi-XA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 12 Nov 2019 at 20:22, Masahiko Sawada
<masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
>
> On Tue, 12 Nov 2019 at 19:08, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Nov 12, 2019 at 3:03 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Nov 12, 2019 at 10:47 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, Nov 11, 2019 at 4:23 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Mon, Nov 11, 2019 at 12:59 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > On Mon, Nov 11, 2019 at 9:43 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > > > >
> > > > > > > On Fri, Nov 8, 2019 at 11:49 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Yeah, I think it is difficult to get the exact balance, but we can try
> > > > > > > > to be as close as possible. We can try to play with the threshold and
> > > > > > > > another possibility is to try to sleep in proportion to the amount of
> > > > > > > > I/O done by the worker.
> > > > > > > I have done another experiment where I have done another 2 changes on
> > > > > > > top op patch3
> > > > > > > a) Only reduce the local balance from the total shared balance
> > > > > > > whenever it's applying delay
> > > > > > > b) Compute the delay based on the local balance.
> > > > > > >
> > > > > > > patch4:
> > > > > > > worker 0 delay=84.130000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > > > worker 1 delay=89.230000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > > > worker 2 delay=88.680000 total I/O=17931 hit=17891 miss=0 dirty=2
> > > > > > > worker 3 delay=80.790000 total I/O=16378 hit=4318 miss=0 dirty=603
> > > > > > >
> > > > > > > I think with this approach the delay is divided among the worker quite
> > > > > > > well compared to other approaches
> > > > > > >
> > > > > > > >
> > > > > ..
> > > > > > I have tested the same with some other workload(test file attached).
> > > > > > I can see the same behaviour with this workload as well that with the
> > > > > > patch 4 the distribution of the delay is better compared to other
> > > > > > patches i.e. worker with more I/O have more delay and with equal IO
> > > > > > have alsomost equal delay. Only thing is that the total delay with
> > > > > > the patch 4 is slightly less compared to other pacthes.
> > > > > >
> > > > >
> > > > > I see one problem with the formula you have used in the patch, maybe
> > > > > that is causing the value of total delay to go down.
> > > > >
> > > > > - if (new_balance >= VacuumCostLimit)
> > > > > + VacuumCostBalanceLocal += VacuumCostBalance;
> > > > > + if ((new_balance >= VacuumCostLimit) &&
> > > > > + (VacuumCostBalanceLocal > VacuumCostLimit/(0.5 * nworker)))
> > > > >
> > > > > As per discussion, the second part of the condition should be
> > > > > "VacuumCostBalanceLocal > (0.5) * VacuumCostLimit/nworker". I think
> > > > > you can once change this and try again. Also, please try with the
> > > > > different values of threshold (0.3, 0.5, 0.7, etc.).
> > > > >
> > > > I have modified the patch4 and ran with different values. But, I
> > > > don't see much difference in the values with the patch4. Infact I
> > > > removed the condition for the local balancing check completely still
> > > > the delays are the same, I think this is because with patch4 worker
> > > > are only reducing their own balance and also delaying as much as their
> > > > local balance. So maybe the second condition will not have much
> > > > impact.
> > > >
> >
> > Yeah, but I suspect the condition (when the local balance exceeds a
> > certain threshold, then only try to perform delay) you mentioned can
> > have an impact in some other scenarios. So, it is better to retain
> > the same. I feel the overall results look sane and the approach seems
> > reasonable to me.
> >
> > > >
> > > I have revised the patch4 so that it doesn't depent upon the fix
> > > number of workers, instead I have dynamically updated the worker
> > > count.
> > >
> >
> > Thanks. Sawada-San, by any chance, can you try some of the tests done
> > by Dilip or some similar tests just to rule out any sort of
> > machine-specific dependency?
>
> Sure. I'll try it tomorrow.

I've done some tests while changing shared buffer size, delays and
number of workers. The overall results has the similar tendency as the
result shared by Dilip and looks reasonable to me.

* test.sh

shared_buffers = '4GB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 1;
worker 0 delay=89.315000 total io=17931 hit=17891 miss=0 dirty=2
worker 1 delay=88.860000 total io=17931 hit=17891 miss=0 dirty=2
worker 2 delay=89.290000 total io=17931 hit=17891 miss=0 dirty=2
worker 3 delay=81.805000 total io=16378 hit=4318 miss=0 dirty=603

shared_buffers = '1GB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 1;
worker 0 delay=89.210000 total io=17931 hit=17891 miss=0 dirty=2
worker 1 delay=89.325000 total io=17931 hit=17891 miss=0 dirty=2
worker 2 delay=88.870000 total io=17931 hit=17891 miss=0 dirty=2
worker 3 delay=81.735000 total io=16378 hit=4318 miss=0 dirty=603

shared_buffers = '512MB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 1;
worker 0 delay=88.480000 total io=17931 hit=17891 miss=0 dirty=2
worker 1 delay=88.635000 total io=17931 hit=17891 miss=0 dirty=2
worker 2 delay=88.600000 total io=17931 hit=17891 miss=0 dirty=2
worker 3 delay=81.660000 total io=16378 hit=4318 miss=0 dirty=603

shared_buffers = '512MB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 5;
worker 0 delay=447.725000 total io=17931 hit=17891 miss=0 dirty=2
worker 1 delay=445.850000 total io=17931 hit=17891 miss=0 dirty=2
worker 2 delay=445.125000 total io=17931 hit=17891 miss=0 dirty=2
worker 3 delay=409.025000 total io=16378 hit=4318 miss=0 dirty=603

shared_buffers = '512MB';
max_parallel_maintenance_workers = 2;
vacuum_cost_delay = 5;
worker 0 delay=854.750000 total io=34309 hit=22209 miss=0 dirty=605
worker 1 delay=446.500000 total io=17931 hit=17891 miss=0 dirty=2
worker 2 delay=444.175000 total io=17931 hit=17891 miss=0 dirty=2

---
* test1.sh

shared_buffers = '4GB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 1;
worker 0 delay=178.205000 total io=35828 hit=35788 miss=0 dirty=2
worker 1 delay=178.550000 total io=35828 hit=35788 miss=0 dirty=2
worker 2 delay=178.660000 total io=35828 hit=35788 miss=0 dirty=2
worker 3 delay=221.280000 total io=44322 hit=8352 miss=1199 dirty=1199

shared_buffers = '1GB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 1;
worker 0 delay=178.035000 total io=35828 hit=35788 miss=0 dirty=2
worker 1 delay=178.535000 total io=35828 hit=35788 miss=0 dirty=2
worker 2 delay=178.585000 total io=35828 hit=35788 miss=0 dirty=2
worker 3 delay=221.465000 total io=44322 hit=8352 miss=1199 dirty=1199

shared_buffers = '512MB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 1;
worker 0 delay=1795.900000 total io=357911 hit=1 miss=35787 dirty=2
worker 1 delay=1790.700000 total io=357911 hit=1 miss=35787 dirty=2
worker 2 delay=179.000000 total io=35828 hit=35788 miss=0 dirty=2
worker 3 delay=221.355000 total io=44322 hit=8352 miss=1199 dirty=1199

shared_buffers = '512MB';
max_parallel_maintenance_workers = 6;
vacuum_cost_delay = 5;
worker 0 delay=8958.500000 total io=357911 hit=1 miss=35787 dirty=2
worker 1 delay=8950.000000 total io=357911 hit=1 miss=35787 dirty=2
worker 2 delay=894.150000 total io=35828 hit=35788 miss=0 dirty=2
worker 3 delay=1106.400000 total io=44322 hit=8352 miss=1199 dirty=1199

shared_buffers = '512MB';
max_parallel_maintenance_workers = 2;
vacuum_cost_delay = 5;
worker 0 delay=8956.500000 total io=357911 hit=1 miss=35787 dirty=2
worker 1 delay=8955.050000 total io=357893 hit=3 miss=35785 dirty=2
worker 2 delay=2002.825000 total io=80150 hit=44140 miss=1199 dirty=1201

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-11-13 04:53:24 Re: [PATCH][DOC] Fix for PREPARE TRANSACTION doc and postgres_fdw message.
Previous Message Dilip Kumar 2019-11-13 04:21:26 Re: [HACKERS] Block level parallel vacuum