Re: Improving connection scalability: GetSnapshotData()

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving connection scalability: GetSnapshotData()
Date: 2020-03-02 23:24:21
Message-ID: 20200302232421.2mlvu2nqoisqidpy@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-03-01 00:36:01 -0800, Andres Freund wrote:
> Here are some numbers for the submitted patch series. I'd to cull some
> further improvements to make it more manageable, but I think the numbers
> still are quite convincing.
>
> The workload is a pgbench readonly, with pgbench -M prepared -c $conns
> -j $conns -S -n for each client count. This is on a machine with 2
> Intel(R) Xeon(R) Platinum 8168, but virtualized.
>
> conns tps master tps pgxact-split
>
> 1 26842.492845 26524.194821
> 10 246923.158682 249224.782661
> 50 695956.539704 709833.746374
> 100 1054727.043139 1903616.306028
> 200 964795.282957 1949200.338012
> 300 906029.377539 1927881.231478
> 400 845696.690912 1911065.369776
> 500 812295.222497 1926237.255856
> 600 888030.104213 1903047.236273
> 700 866896.532490 1886537.202142
> 800 863407.341506 1883768.592610
> 900 871386.608563 1874638.012128
> 1000 887668.277133 1876402.391502
> 1500 860051.361395 1815103.564241
> 2000 890900.098657 1775435.271018
> 3000 874184.980039 1653953.817997
> 4000 845023.080703 1582582.316043
> 5000 817100.195728 1512260.802371
>
> I think these are pretty nice results.

> One further cool recognition of the fact that GetSnapshotData()'s
> results can be made to only depend on the set of xids in progress, is
> that caching the results of GetSnapshotData() is almost trivial at that
> point: We only need to recompute snapshots when a toplevel transaction
> commits/aborts.
>
> So we can avoid rebuilding snapshots when no commt has happened since it
> was last built. Which amounts to assigning a current 'commit sequence
> number' to the snapshot, and checking that against the current number
> at the time of the next GetSnapshotData() call. Well, turns out there's
> this "LSN" thing we assign to commits (there are some small issues with
> that though). I've experimented with that, and it considerably further
> improves the numbers above. Both with a higher peak throughput, but more
> importantly it almost entirely removes the throughput regression from
> 2000 connections onwards.
>
> I'm still working on cleaning that part of the patch up, I'll post it in
> a bit.

I triggered a longer run on the same hardware, that also includes
numbers for the caching patch.

nclients master pgxact-split pgxact-split-cache
1 29742.805074 29086.874404 28120.709885
2 58653.005921 56610.432919 57343.937924
3 116580.383993 115102.94057 117512.656103
4 150821.023662 154130.354635 152053.714824
5 186679.754357 189585.156519 191095.841847
6 219013.756252 223053.409306 224480.026711
7 256861.673892 256709.57311 262427.179555
8 291495.547691 294311.524297 296245.219028
9 332835.641015 333223.666809 335460.280487
10 367883.74842 373562.206447 375682.894433
15 561008.204553 578601.577916 587542.061911
20 748000.911053 794048.140682 810964.700467
25 904581.660543 1037279.089703 1043615.577083
30 999231.007768 1251113.123461 1288276.726489
35 1001274.289847 1438640.653822 1438508.432425
40 991672.445199 1518100.079695 1573310.171868
45 994427.395069 1575758.31948 1649264.339117
50 1017561.371878 1654776.716703 1715762.303282
60 993943.210188 1720318.989894 1789698.632656
70 971379.995255 1729836.303817 1819477.25356
80 966276.137538 1744019.347399 1842248.57152
90 901175.211649 1768907.069263 1847823.970726
100 803175.74326 1784636.397822 1865795.782943
125 664438.039582 1806275.514545 1870983.64688
150 623562.201749 1796229.009658 1876529.428419
175 680683.150597 1809321.487338 1910694.40987
200 668413.988251 1833457.942035 1878391.674828
225 682786.299485 1816577.462613 1884587.77743
250 727308.562076 1825796.324814 1864692.025853
275 676295.999761 1843098.107926 1908698.584573
300 698831.398432 1832068.168744 1892735.290045
400 661534.639489 1859641.983234 1898606.247281
500 645149.788352 1851124.475202 1888589.134422
600 740636.323211 1875152.669115 1880653.747185
700 858645.363292 1833527.505826 1874627.969414
800 858287.957814 1841914.668668 1892106.319085
900 882204.933544 1850998.221969 1868260.041595
1000 910988.551206 1836336.091652 1862945.18557
1500 917727.92827 1808822.338465 1864150.00307
2000 982137.053108 1813070.209217 1877104.342864
3000 1013514.639108 1753026.733843 1870416.924248
4000 1025476.80688 1600598.543635 1859908.314496
5000 1019889.160511 1534501.389169 1870132.571895
7500 968558.864242 1352137.828569 1853825.376742
10000 887558.112017 1198321.352461 1867384.381886
15000 687766.593628 950788.434914 1710509.977169

The odd dip for master between 90 and 700 connections looks like it's
not directly related to GetSnapshotData(). It looks like it's related to
the linux scheduler and virtiualization. When a pgbench thread and
postgres backend need to swap who gets executed, and both are on
different CPUs, the wakeup is more expensive when the target CPU is idle
or isn't going to reschedule soon. In the expensive path a
inter-process-interrupt (IPI) gets triggered, which requires to exit out
of the VM (which is really expensive on azure, apparently). I can
trigger similar behaviour for the other runs by renicing, albeit on a
slightly smaller scale.

I'll try to find a larger system that's not virtualized :/.

Greetings,

Andres Freund

Attachment Content-Type Size
image/png 18.2 KB
csn.diff text/x-diff 5.1 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Cary Huang 2020-03-02 23:48:53 Re: Internal key management system
Previous Message Nikita Glukhov 2020-03-02 23:24:17 Re: SQL/JSON: functions