Re: cost based vacuum (parallel)

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Mahendra Singh <mahi6run(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: cost based vacuum (parallel)
Date: 2019-11-15 10:14:18
Message-ID: CAFiTN-t62AqvWmkwrB831RGKNfB8PuhEq9n=P4ZbDPGyWo=L6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 14, 2019 at 5:02 PM Mahendra Singh <mahi6run(at)gmail(dot)com> wrote:
>
> On Mon, 11 Nov 2019 at 17:56, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, Nov 11, 2019 at 5:14 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Nov 11, 2019 at 4:23 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > ..
> > > > > I have tested the same with some other workload(test file attached).
> > > > > I can see the same behaviour with this workload as well that with the
> > > > > patch 4 the distribution of the delay is better compared to other
> > > > > patches i.e. worker with more I/O have more delay and with equal IO
> > > > > have alsomost equal delay. Only thing is that the total delay with
> > > > > the patch 4 is slightly less compared to other pacthes.
> > > > >
> > > >
> > > > I see one problem with the formula you have used in the patch, maybe
> > > > that is causing the value of total delay to go down.
> > > >
> > > > - if (new_balance >= VacuumCostLimit)
> > > > + VacuumCostBalanceLocal += VacuumCostBalance;
> > > > + if ((new_balance >= VacuumCostLimit) &&
> > > > + (VacuumCostBalanceLocal > VacuumCostLimit/(0.5 * nworker)))
> > > >
> > > > As per discussion, the second part of the condition should be
> > > > "VacuumCostBalanceLocal > (0.5) * VacuumCostLimit/nworker".
> > > My Bad
> > > I think
> > > > you can once change this and try again. Also, please try with the
> > > > different values of threshold (0.3, 0.5, 0.7, etc.).
> > > >
> > > Okay, I will retest with both patch3 and path4 for both the scenarios.
> > > I will also try with different multipliers.
> > >
> >
> > One more thing, I think we should also test these cases with a varying
> > number of indexes (say 2,6,8,etc.) and then probably, we should test
> > by a varying number of workers where the number of workers are lesser
> > than indexes. You can do these after finishing your previous
> > experiments.
>
> On the top of parallel vacuum patch, I applied Dilip's patch(0001-vacuum_costing_test.patch). I have tested by varying number of indexes and number of workers. I compared shared costing(0001-vacuum_costing_test.patch) vs shared costing latest patch(shared_costing_plus_patch4_v1.patch).
> With shared costing base patch, I can see that delay is not in sync compared to I/O which is resolved by applying patch (shared_costing_plus_patch4_v1.patch). I have also observed that total delay is slightly reduced with shared_costing_plus_patch4_v1.patch patch.
>
> Below is the full testing summary:
> Test setup:
> step1) Apply parallel vacuum patch
> step2) Apply 0001-vacuum_costing_test.patch patch (on the top of this patch, delay is not in sync compared to I/O)
> step3) Apply shared_costing_plus_patch4_v1.patch (delay is in sync compared to I/O)
>
> Configuration settings:
> autovacuum = off
> max_parallel_workers = 30
> shared_buffers = 2GB
> max_parallel_maintenance_workers = 20
> vacuum_cost_limit = 2000
> vacuum_cost_delay = 10
>
> Test 1: Varry indexes(2,4,6,8) but parallel workers are fixed as 4:
>
> Case 1) When indexes are 2:
> Without shared_costing_plus_patch4_v1.patch:
> WARNING: worker 0 delay=120.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING: worker 1 delay=60.000000 total io=17931 hit=17891 miss=0 dirty=2
>
> With shared_costing_plus_patch4_v1.patch:
> WARNING: worker 0 delay=87.780000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING: worker 1 delay=87.995000 total io=17931 hit=17891 miss=0 dirty=2
>
> Case 2) When indexes are 4:
> Without shared_costing_plus_patch4_v1.patch:
> WARNING: worker 0 delay=120.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING: worker 1 delay=80.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING: worker 2 delay=60.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING: worker 3 delay=100.000000 total io=17931 hit=17891 miss=0 dirty=2
>
> With shared_costing_plus_patch4_v1.patch:
> WARNING: worker 0 delay=87.430000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING: worker 1 delay=87.175000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING: worker 2 delay=86.340000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING: worker 3 delay=88.020000 total io=17931 hit=17891 miss=0 dirty=2
>
> Case 3) When indexes are 6:
> Without shared_costing_plus_patch4_v1.patch:
> WARNING: worker 0 delay=110.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING: worker 1 delay=100.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING: worker 2 delay=160.000000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING: worker 3 delay=90.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING: worker 4 delay=80.000000 total io=17931 hit=17891 miss=0 dirty=2
>
> With shared_costing_plus_patch4_v1.patch:
> WARNING: worker 0 delay=173.195000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING: worker 1 delay=88.715000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING: worker 2 delay=87.710000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING: worker 3 delay=86.460000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING: worker 4 delay=89.435000 total io=17931 hit=17891 miss=0 dirty=2
>
> Case 4) When indexes are 8:
> Without shared_costing_plus_patch4_v1.patch:
> WARNING: worker 0 delay=170.000000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING: worker 1 delay=120.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING: worker 2 delay=130.000000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING: worker 3 delay=190.000000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING: worker 4 delay=110.000000 total io=35862 hit=35782 miss=0 dirty=4
>
> With shared_costing_plus_patch4_v1.patch:
> WARNING: worker 0 delay=174.700000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING: worker 1 delay=177.880000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING: worker 2 delay=89.460000 total io=17931 hit=17891 miss=0 dirty=2
> WARNING: worker 3 delay=177.320000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING: worker 4 delay=86.810000 total io=17931 hit=17891 miss=0 dirty=2
>
> Test 2: Indexes are 16 but parallel workers are 2,4,8:
>
> Case 1) When 2 parallel workers:
> Without shared_costing_plus_patch4_v1.patch:
> WARNING: worker 0 delay=1513.230000 total io=307197 hit=85167 miss=22179 dirty=12
> WARNING: worker 1 delay=1543.385000 total io=326553 hit=63133 miss=26322 dirty=10
> WARNING: worker 2 delay=1633.625000 total io=302199 hit=65839 miss=23616 dirty=10
>
> With shared_costing_plus_patch4_v1.patch:
> WARNING: worker 0 delay=1539.475000 total io=308175 hit=65175 miss=24280 dirty=10
> WARNING: worker 1 delay=1251.200000 total io=250692 hit=71562 miss=17893 dirty=10
> WARNING: worker 2 delay=1143.690000 total io=228987 hit=93857 miss=13489 dirty=12
>
> Case 2) When 4 parallel workers:
> Without shared_costing_plus_patch4_v1.patch:
> WARNING: worker 0 delay=1182.430000 total io=213567 hit=16037 miss=19745 dirty=4
> WARNING: worker 1 delay=1202.710000 total io=178941 hit=1 miss=17890 dirty=2
> WARNING: worker 2 delay=210.000000 total io=89655 hit=89455 miss=0 dirty=10
> WARNING: worker 3 delay=270.000000 total io=71724 hit=71564 miss=0 dirty=8
> WARNING: worker 4 delay=851.825000 total io=188229 hit=58619 miss=12945 dirty=8
>
> With shared_costing_plus_patch4_v1.patch:
> WARNING: worker 0 delay=1136.875000 total io=227679 hit=14469 miss=21313 dirty=4
> WARNING: worker 1 delay=973.745000 total io=196881 hit=17891 miss=17891 dirty=4
> WARNING: worker 2 delay=447.410000 total io=89655 hit=89455 miss=0 dirty=10
> WARNING: worker 3 delay=833.235000 total io=168228 hit=40958 miss=12715 dirty=6
> WARNING: worker 4 delay=683.200000 total io=136488 hit=64368 miss=7196 dirty=8
>
> Case 3) When 8 parallel workers:
> Without shared_costing_plus_patch4_v1.patch:
> WARNING: worker 0 delay=1022.300000 total io=178941 hit=1 miss=17890 dirty=2
> WARNING: worker 1 delay=1072.770000 total io=178941 hit=1 miss=17890 dirty=2
> WARNING: worker 2 delay=170.000000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING: worker 3 delay=170.000000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING: worker 4 delay=140.035000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING: worker 5 delay=200.000000 total io=53802 hit=53672 miss=1 dirty=6
> WARNING: worker 6 delay=130.000000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING: worker 7 delay=150.000000 total io=53793 hit=53673 miss=0 dirty=6
>
> With shared_costing_plus_patch4_v1.patch:
> WARNING: worker 0 delay=872.800000 total io=178941 hit=1 miss=17890 dirty=2
> WARNING: worker 1 delay=885.950000 total io=178941 hit=1 miss=17890 dirty=2
> WARNING: worker 2 delay=175.680000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING: worker 3 delay=259.560000 total io=53793 hit=53673 miss=0 dirty=6
> WARNING: worker 4 delay=169.945000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING: worker 5 delay=613.845000 total io=125100 hit=45750 miss=7923 dirty=6
> WARNING: worker 6 delay=171.895000 total io=35862 hit=35782 miss=0 dirty=4
> WARNING: worker 7 delay=176.505000 total io=35862 hit=35782 miss=0 dirty=4

It seems that the bigger delay difference (8% - 9 %), which is
observed with the higher number of indexes is due to the IO
difference, for example in case3, the total page miss without patch is
35780 whereas with the patch it is 43703. So it seems that with more
indexes your data is not fitting in the shared buffer so the page
hits/misses are varying run to run and that will cause variance in the
total delay. Another problem where delay with the patch is 2-3%
lesser, is basically the problem of the "0001-vacuum_costing_test"
patch because that patch is only displaying the delay during the index
vacuuming phase, not the total delay. So if we observe the total
delay then it should be the same. The modified version of
0001-vacuum_costing_test is attached to print the total delay.

In my test.sh, I can see the total delay is almost the same.

Non-parallel vacuum
WARNING: VacuumCostTotalDelay = 11332.170000

Paralle vacuum with shared_costing_plus_patch4_v1.patch:
WARNING: worker 0 delay=89.230000 total io=17931 hit=17891 miss=0 dirty=2
WARNING: worker 1 delay=85.205000 total io=17931 hit=17891 miss=0 dirty=2
WARNING: worker 2 delay=87.290000 total io=17931 hit=17891 miss=0 dirty=2
WARNING: worker 3 delay=78.365000 total io=16378 hit=4318 miss=0 dirty=603

WARNING: VacuumCostTotalDelay = 11331.690000

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
0001-vacuum_costing_test_total_delay.patch application/octet-stream 9.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-11-15 10:20:10 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Kubilay Kaan 2019-11-15 10:10:59 Getting Recordset through returning refcursor - second try(first has wrong format sorry)