Re: Init connection time grows quadratically

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: "Maksim(dot)Melnikov" <m(dot)melnikov(at)postgrespro(dot)ru>
Cc: Потапов Александр <a(dot)potapov(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Init connection time grows quadratically
Date: 2026-06-03 13:35:00
Message-ID: CAEze2Wh36P+6DVHy-kKWpxVFZ3M7Bsh16z3zbsZEEy9DpZ5Cjg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 3 Jun 2026 at 08:33, Maksim.Melnikov <m(dot)melnikov(at)postgrespro(dot)ru> wrote:
>
> On 6/16/25 11:56, Потапов Александр wrote:
>
> > To be more precise I used constant number of threads (128 and 1024) to compare with previous results. The quadratic dependency exists everywhere, see new graph.
> >
> > > Q: Did you check that pgbench or the OS does not have
> > > O(n_active_connections) or O(n_active_threads) overhead per worker
> > > during thread creation or connection establishment, e.g. by varying
> > > the number of threads used to manage these N clients? I wouldn't be
> > > surprised if there are inefficiencies in e.g. the threading- or
> > > synchronization model that cause O(N) per-thread overhead, or O(N^2)
> > > overall when you have one thread per connection.
>
>
> Hi, all!
>
> I've investigated slightly different scenario then Alexander and I want share my thoughts in this thread too.
>
> I found that when we run pgbench scenarios sequantially, without postgres restart between iterations, initial time degrades from launch to launch and eventually it stabilizes at the worst values then first run(ICT_degradation.png attached).
>
> Scenario details:
[...]
> 4.Add to the postgresql.conf:
> huge_pages = off #for the sake of test stability and reproducibility

I think this is the main culprit of the extreme slowdown -- without
huge pages, you're effectively guaranteed to get many minor page
faults, and with it the relevant TLB miss rates. With huge pages
enabled, the proc array should fit on one (or just a few) memory
pages.

We're not generally in the business for optimizing workloads that have
huge_pages=off.

> I paid attention that ICT for the first iteration much better than for next ones. I investigated this behavior a little bit and found a lot of minor page fault events in ProcArrayAdd method(perf_without_patch-j6.txt attached) for code line
>
> allProcs[procno].pgxactoff = index;
>
> So, every proc.pgxactoff access generate page fault, because proc objects accessed in memory randomly and page replacement can occur. I have some ideas how to improve this - it seems we can put array of pgxactoff separately

> page replacement can occur

I doubt that this is an issue. Page tables are not removed until the
mapping is removed, and it is highly unlikely that hot shmem areas
(like the PGPROC array) are ever swapped out. It's just that with
smaller memory pages the OS will have to create more page mappings for
the same amount of shared memory, and that'll take more resources
(cpu, memory, time) than it would with large (or huge) memory pages.

> in shmem to have only few hot pages for them. I've attached appropriate patch(0001-This-patch-reduce-connection-init-close-time.patch). Perf with minor faults for updated version also was attached(perf_with_patch-j6.txt attached),

I see. Despite your argument hinging on small pages, I think there is
still some benefit to using a dense array instead of PGPROC.pgxactoff:
With a dense array, ProcArrayAdd/ProcArrayRemove need to touch fewer
cache lines, which are also less likely to be recently dirtied by
unrelated shared proc updates.

However, I'm now a bit more concerned about the number of indirections
required for other operations. Before, accessing pgxactoff was an
offset off of the PgProc pointer, but with this patch getting its
value is a bit more involved.

> as we can see, patched version fixes this. I made a series of measurements for all versions and attached comparison chart(ICT_degradation_with_patch.png attached). Also I add the table with results

Do you happen to have data with huge_pages enabled?

> I hope it will be interesting and helpful.

Definitely interesting. I'm not so sure it's as effective on a
production configuration (with huge pages enabled), but I'm definitely
interested in seeing test results.

----

Some comments on the patch:

> +++ b/src/backend/storage/lmgr/proc.c
> + size = add_size(size, mul_size(TotalProcs, sizeof(int)));

Let's use the following, to fit the surrounding pattern:

+ size = add_size(size, mul_size(TotalProcs,
sizeof(*ProcGlobal->pgxactoffs)));

> @@ -273,7 +274,10 @@ ProcGlobalShmemInit(void *arg)
> ProcGlobal->statusFlags = (uint8 *) ptr;
> ptr = ptr + (TotalProcs * sizeof(*ProcGlobal->statusFlags));
>
> - /* make sure we didn't overflow */
> + ProcGlobal->pgxactoffs = (int *) ptr;
> + ptr = (char *) ptr + TotalProcs * sizeof(int);
> +
> + /* make sure wer didn't overflow */
> Assert((ptr > (char *) procs) && (ptr <= (char *) procs + requestSize));

This needs to be updated, because right now it fails to account for
alignment when (TotalProcs * sizeof(statusFlags)) is not a multiple of
sizeof(int). The other fields take care to be correctly aligned, but
your code doesn't do that yet. It's probably best to allocate and
assign *pgxactoffs just ahead of statusFlags.

> + /* make sure wer didn't overflow */

New typo introduced.

> +++ b/src/include/storage/proc.h
> +#define GetXactOffPGProc(proc) (ProcGlobal->pgxactoffs[(proc) - &ProcGlobal->allProcs[0]])
> +#define GetMyXactOffPGProc() (GetXactOffPGProc(MyProc))

I'd replace this with

+#define ProcGetXactOff(procno) (ProcGlobal->pgxactoffs[(procno)])
+#define ProcGetMyXactOff() (GetXactOffPGProc(MyProcNo))

So that callers can use GetNumberFromPGProc() manually if they need
to, but the offset-based calculations of Proc-to-Number are avoided
when that is possible.

Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2026-06-03 13:58:41 Add per-backend lock statistics
Previous Message Peter Eisentraut 2026-06-03 13:29:07 Re: Make memory checking / sanitizing infrastructure better