RE: [Patch] Optimize dropping of relation buffers using dlist

From: "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, "tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>
Cc: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "'jankirk(dot)jamison(at)gmail(dot)com'" <jankirk(dot)jamison(at)gmail(dot)com>
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist
Date: 2021-01-02 14:17:49
Message-ID: OSBPR01MB234185B4C13D9552023625B3EFD40@OSBPR01MB2341.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, December 30, 2020 8:58 PM, Amit Kapila wrote:
> On Wed, Dec 30, 2020 at 11:28 AM Tang, Haiying
> <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
> >
> > Hi Amit,
> >
> > In last
> >
> mail(https://www.postgresql.org/message-id/66851e198f6b41eda59e625718
> 2
> > 564b6%40G08CNEXMBPEKD05.g08.fujitsu.local),
> > I've sent you the performance test results(run only 1 time) on single table.
> Here is my the retested results(average by 15 times) which I think is more
> accurate.
> >
> > In terms of 20G and 100G, the optimization on 100G is linear, but 20G is
> nonlinear(also include test results on shared buffers of 50G/60G), so it's a
> little difficult to decide the threshold from the two for me.
> > If just consider 100G, I think NBuffers/32 is the optimized max relation size.
> But I don't know how to judge for 20G. If you have any suggestion, kindly let
> me know.
> >
>
> Considering these results NBuffers/64 seems a good threshold as beyond
> that there is no big advantage. BTW, it is not clear why the advantage for
> single table is not as big as multiple tables with the Truncate command. Can
> you share your exact test steps for any one of the tests?
> Also, did you change autovacumm = off for these tests, if not then the results
> might not be reliable because before you run the test via Vacuum command
> autovacuum would have done that work?

Happy new year. The V38 LGTM.
Apologies for a bit of delay on posting the test results, but since it's the
start of commitfest, here it goes and the results were interesting.

I executed a VACUUM test using the same approach that Tsunakawa-san did in [1],
but only this time, the total time that DropRelFileNodeBuffers() took.
I used only a single relation, tried with various sizes using the values of threshold:
NBuffers/512..NBuffers/1, as advised by Amit.

Example of relation sizes for NBuffers/512.
100GB shared_buffers: 200 MB
20GB shared_buffers: 40 MB
1GB shared_buffers: 2 MB
128MB shared_buffers: 0.25 MB

The regression, which means the patch performs worse than master, only happens
for relation size NBuffers/2 and below for all shared_buffers. The fastest
performance on a single relation was using the relation size NBuffers/512.

[VACUUM Recovery Performance on Single Relation]
Legend: P_XXX (Patch, NBuffers/XXX relation size),
M_XXX (Master, Nbuffers/XXX relation size)
Unit: seconds

| Rel Size | 100 GB s_b | 20 GB s_b | 1 GB s_b | 128 MB s_b |
|----------|------------|------------|------------|------------|
| P_512 | 0.012594 | 0.001989 | 0.000081 | 0.000012 |
| M_512 | 0.208757 | 0.046212 | 0.002013 | 0.000295 |
| P_256 | 0.026311 | 0.004416 | 0.000129 | 0.000021 |
| M_256 | 0.241017 | 0.047234 | 0.002363 | 0.000298 |
| P_128 | 0.044684 | 0.009784 | 0.000290 | 0.000042 |
| M_128 | 0.253588 | 0.047952 | 0.002454 | 0.000319 |
| P_64 | 0.065806 | 0.017444 | 0.000521 | 0.000075 |
| M_64 | 0.268311 | 0.050361 | 0.002730 | 0.000339 |
| P_32 | 0.121441 | 0.033431 | 0.001646 | 0.000112 |
| M_32 | 0.285254 | 0.061486 | 0.003640 | 0.000364 |
| P_16 | 0.255503 | 0.065492 | 0.001663 | 0.000144 |
| M_16 | 0.377013 | 0.081613 | 0.003731 | 0.000372 |
| P_8 | 0.560616 | 0.109509 | 0.005954 | 0.000465 |
| M_8 | 0.692596 | 0.112178 | 0.006667 | 0.000553 |
| P_4 | 1.109437 | 0.162924 | 0.011229 | 0.000861 |
| M_4 | 1.162125 | 0.178764 | 0.011635 | 0.000935 |
| P_2 | 2.202231 | 0.317832 | 0.020783 | 0.002646 |
| M_2 | 1.583959 | 0.306269 | 0.015705 | 0.002021 |
| P_1 | 3.080032 | 0.632747 | 0.032183 | 0.002660 |
| M_1 | 2.705485 | 0.543970 | 0.030658 | 0.001941 |

%reg = (Patched/Master)/Patched

| %reg_relsize | 100 GB s_b | 20 GB s_b | 1 GB s_b | 128 MB s_b |
|--------------|------------|------------|------------|------------|
| %reg_512 | -1557.587% | -2223.006% | -2385.185% | -2354.167% |
| %reg_256 | -816.041% | -969.691% | -1731.783% | -1319.048% |
| %reg_128 | -467.514% | -390.123% | -747.008% | -658.333% |
| %reg_64 | -307.727% | -188.704% | -423.992% | -352.000% |
| %reg_32 | -134.891% | -83.920% | -121.097% | -225.970% |
| %reg_16 | -47.557% | -24.614% | -124.279% | -157.390% |
| %reg_8 | -23.542% | -2.437% | -11.967% | -19.010% |
| %reg_4 | -4.749% | -9.722% | -3.608% | -8.595% |
| %reg_2 | 28.075% | 3.638% | 24.436% | 23.615% |
| %reg_1 | 12.160% | 14.030% | 4.739% | 27.010% |

Since our goal is to get the approximate threshold where the cost for
finding to be invalidated buffers gets higher in optimized path than
the traditional path:
A. Traditional Path
1. For each shared_buffers, compare the relfilenode.
2. LockBufHdr()
3. Compare block number, InvalidateBuffers() if it's the target.
B. Optimized Path
1. For each block in rleation, LWLockAcquire(), BufTableLookup(),
and LWLockRelease().
2-3. Same as traditional path.

So we have to get the difference in #1, where the number of buffers
and the check for each number of to be invalidated buffers differ.
The cost of optimized path will get higher than the traditional path
at some threshold.

NBuffers * traditional_cost_for_each_buf_check <
InvalidatedBuffers * optimized_cost_for_each_buf_check

So what we want to know as the threshold value is the InvalidatedBuffers.
NBuffers * traditional / optimized < InvalidatedBuffers.

Example for 100GB shared_buffers for rel_size NBuffers/512:
100000(MB) * 0.208757 (s) / 0.012594 (s) = 1,657,587 MB,
which is still above the value of 100,000 MB.

| s_b | 100000 | 20000 | 1000 | 128 |
|--------------|-----------|---------|--------|-------|
| NBuffers/512 | 1,657,587 | 464,601 | 24,852 | 3,141 |
| NBuffers/256 | 916,041 | 213,938 | 18,318 | 1,816 |
| NBuffers/128 | 567,514 | 98,025 | 8,470 | 971 |
| NBuffers/64 | 407,727 | 57,741 | 5,240 | 579 |
| NBuffers/32 | 234,891 | 36,784 | 2,211 | 417 |
| NBuffers/16 | 147,557 | 24,923 | 2,243 | 329 |
| NBuffers/8 | 123,542 | 20,487 | 1,120 | 152 |
| NBuffers/4 | 104,749 | 21,944 | 1,036 | 139 |
| NBuffers/2 | 71,925 | 19,272 | 756 | 98 |
| NBuffers/1 | 87,840 | 17,194 | 953 | 93 |

Although the above table shows that NBuffers/2 would be the
threshold, I know that the cost would vary depending on the machine
specs. I think I can suggest the threshold and pick one from among
NBuffers/2, NBuffers/4 or NBuffers/8, because their values are closer
to the InvalidatedBuffers.

[postgesql.conf]
shared_buffers = 100GB #20GB,1GB,128MB
autovacuum = off
full_page_writes = off
checkpoint_timeout = 30min
max_locks_per_transaction = 10000

[Machine Specs Used]
Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz
8 CPUs, 256GB Memory
XFS, RHEL7.2

Kindly let me know if you have comments regarding the results.

Regards,
Kirk Jamison

[1] https://www.postgresql.org/message-id/TYAPR01MB2990C4EFE63F066F83D2A603FEE70%40TYAPR01MB2990.jpnprd01.prod.outlook.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2021-01-02 15:47:57 Re: Move --data-checksums to common options in initdb --help
Previous Message Dmitry Dolgov 2021-01-02 14:14:11 Re: [HACKERS] [PATCH] Generic type subscripting