Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)
Date: 2022-05-29 16:00:14
Message-ID: 6ec612ba-1c97-6845-5ee2-964b80996898@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5/28/22 16:12, Ranier Vilela wrote:
> Em sáb., 28 de mai. de 2022 às 09:00, Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com <mailto:tomas(dot)vondra(at)enterprisedb(dot)com>>
> escreveu:
>
> On 5/28/22 02:15, Ranier Vilela wrote:
> >
> >
> > Em sex., 27 de mai. de 2022 às 18:08, Andres Freund
> <andres(at)anarazel(dot)de <mailto:andres(at)anarazel(dot)de>
> > <mailto:andres(at)anarazel(dot)de <mailto:andres(at)anarazel(dot)de>>> escreveu:
> >
> >     Hi,
> >
> >     On 2022-05-27 03:30:46 +0200, Tomas Vondra wrote:
> >     > On 5/27/22 02:11, Ranier Vilela wrote:
> >     > > ./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U
> postgres
> >     > >
> >     > > pgbench (15beta1)
> >     > > transaction type: <builtin: select only>
> >     > > scaling factor: 1
> >     > > query mode: prepared
> >     > > number of clients: 100
> >     > > number of threads: 100
> >     > > maximum number of tries: 1
> >     > > duration: 60 s
> >     > >
> >     > > conns            tps head              tps patched
> >     > >
> >     > > 1         17126.326108            17792.414234
> >     > > 10       82068.123383            82468.334836
> >     > > 50       73808.731404            74678.839428
> >     > > 80       73290.191713            73116.553986
> >     > > 90       67558.483043            68384.906949
> >     > > 100     65960.982801            66997.793777
> >     > > 200     62216.011998            62870.243385
> >     > > 300     62924.225658            62796.157548
> >     > > 400     62278.099704            63129.555135
> >     > > 500     63257.930870            62188.825044
> >     > > 600     61479.890611            61517.913967
> >     > > 700     61139.354053            61327.898847
> >     > > 800     60833.663791            61517.913967
> >     > > 900     61305.129642            61248.336593
> >     > > 1000   60990.918719            61041.670996
> >     > >
> >     >
> >     > These results look much saner, but IMHO it also does not
> show any
> >     clear
> >     > benefit of the patch. Or are you still claiming there is a
> benefit?
> >
> >     They don't look all that sane to me - isn't that way lower
> than one
> >     would
> >     expect?
> >
> > Yes, quite disappointing.
> >
> >     Restricting both client and server to the same four cores, a
> >     thermically challenged older laptop I have around I get 150k
> tps at
> >     both 10
> >     and 100 clients.
> >
> > And you can share the benchmark details? Hardware, postgres and
> pgbench,
> > please?
> >
> >
> >     Either way, I'd not expect to see any GetSnapshotData()
> scalability
> >     effects to
> >     show up on an "Intel® Core™ i5-8250U CPU Quad Core" - there's just
> >     not enough
> >     concurrency.
> >
> > This means that our customers will not see any connections scalability
> > with PG15, using the simplest hardware?
> >
>
> No. It means that on 4-core machine GetSnapshotData() is unlikely to be
> a bottleneck, because you'll hit various other bottlenecks way earlier.
>
> I personally doubt it even makes sense to worry about scaling to this
> many connections on such tiny system too much.
>
>
> >
> >     The correct pieces of these changes seem very unlikely to affect
> >     GetSnapshotData() performance meaningfully.
> >
> >     To improve something like GetSnapshotData() you first have to come
> >     up with a
> >     workload that shows it being a meaningful part of a profile.
> Unless
> >     it is,
> >     performance differences are going to just be due to various
> forms of
> >     noise.
> >
> > Actually in the profiles I got with perf, GetSnapShotData() didn't
> show up.
> >
>
> But that's exactly the point Andres is trying to make - if you don't see
> GetSnapshotData() in the perf profile, why do you think optimizing it
> will have any meaningful impact on throughput?
>
> You see, I've seen in several places that GetSnapShotData() is the
> bottleneck in scaling connections.
> One of them, if I remember correctly, was at an IBM in Russia.
> Another statement occurs in [1][2][3]

No one is claiming GetSnapshotData() can't be a bottleneck on systems
with many cores. That's certainly possible, which is why e.g. Andres
spent a lot of time optimizing for that case.

But that's what we're arguing about. You're trying to convince us that
your patch will improve things, and you're supporting that by numbers
from a machine that is unlikely to be hitting this bottleneck.

> Just because I don't have enough hardware to force GetSnapShotData()
> doesn't mean optimizing it won't make a difference.

Well, the question is if it actually optimizes things. Maybe it does,
which would be great, but people in this thread (including me) seem to
be fairly skeptical about that claim, because the results are frankly
entirely unconvincing.

I doubt we'll just accept changes in such sensitive places without
results from a relevant machine. Maybe if there was a clear agreement
it's a win, but that's not the case here.

> And even on my modest hardware, we've seen gains, small but consistent.
> So IMHO everyone will benefit, including the small servers.
>

No, we haven't seen any convincing gains. I've tried to explain multiple
times that the results you've shared are not showing any clear
improvement, due to only having one run for each client count (which
means there's a lot of noise), impact of binary layout in different
builds, etc. You've ignored all of that, so instead of repeating myself,
I did a simple benchmark on my two machines:

1) i5-2500k / 4 cores and 8GB RAM (so similar to what you have)

2) 2x e5-2620v3 / 16/32 cores, 64GB RAM (so somewhat bigger)

and I tested 1, 2, 5, 10, 50, 100, ...., 1000 clients using the same
benchmark as you (pgbench -S -M prepared ... ). I did 10 client counts
for each client count, to calculate median which evens out the noise.
And for fun I tried this with gcc 9.3, 10.3 and 11.2. The script and
results from both machines are attached.

The results from xeon and gcc 11.2 look like this:

clients master patched diff
---------------------------------------
1 46460 44936 97%
2 87486 84746 97%
5 199102 192169 97%
10 344458 339403 99%
20 515257 512513 99%
30 528675 525467 99%
40 592761 594384 100%
50 694635 706193 102%
100 643950 655238 102%
200 690133 696815 101%
300 670403 677818 101%
400 678573 681387 100%
500 665349 678722 102%
600 666028 670915 101%
700 662316 662511 100%
800 647922 654745 101%
900 650274 654698 101%
1000 644482 649332 101%

Please, explain to me how this shows consistent measurable improvement?

The standard deviation is roughly 1.5% on average, and the difference is
well within that range. Even if there was a tiny improvement for the
high client counts, no one sane will run with that many clients, because
the throughput peaks at ~50 clients. So even if you gain 1% with 500
clients, it's still less than with 50 clients. If anything, this shows
regression for lower client counts.

FWIW this entirely ignores the question is this benchmark even hits the
bottleneck this patch aims to improve. Also, there's the question of
correctness, and I'd bet Andres is right getting snapshot without
holding ProcArrayLock is busted.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
xeon-chart.pdf application/pdf 26.0 KB
i5.csv text/csv 38.3 KB
run.sh application/x-shellscript 318 bytes
xeon.csv text/csv 38.4 KB
i5.pdf application/pdf 25.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2022-05-29 17:26:44 Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)
Previous Message Justin Pryzby 2022-05-29 15:18:50 Re: convert libpq uri-regress tests to tap test