Re: Estimating HugePages Requirements?

From: Don Seiler <don(at)seiler(dot)us>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: P C <puravc(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Estimating HugePages Requirements?
Date: 2021-06-14 14:16:39
Message-ID: CAHJZqBAZ+SYR4jZ-Jy5nHYwUP3vYF+UjPGKwCR+gZm0z8vyoag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

On Thu, Jun 10, 2021 at 7:23 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> On Wed, Jun 09, 2021 at 10:55:08PM -0500, Don Seiler wrote:
> > On Wed, Jun 9, 2021, 21:03 P C <puravc(at)gmail(dot)com> wrote:
> >
> > > I agree, its confusing for many and that confusion arises from the fact
> > > that you usually talk of shared_buffers in MB or GB whereas hugepages
> have
> > > to be configured in units of 2mb. But once they understand they
> realize its
> > > pretty simple.
> > >
> > > Don, we have experienced the same not just with postgres but also with
> > > oracle. I havent been able to get to the root of it, but what we
> usually do
> > > is, we add another 100-200 pages and that works for us. If the SGA or
> > > shared_buffers is high eg 96gb, then we add 250-500 pages. Those few
> > > hundred MBs may be wasted (because the moment you configure
> hugepages, the
> > > operating system considers it as used and does not use it any more) but
> > > nowadays, servers have 64 or 128 gb RAM easily and wasting that 500mb
> to
> > > 1gb does not hurt really.
> >
> > I don't have a problem with the math, just wanted to know if it was
> > possible to better estimate what the actual requirements would be at
> > deployment time. My fallback will probably be you did and just pad with
> an
> > extra 512MB by default.
>
> It's because the huge allocation isn't just shared_buffers, but also
> wal_buffers:
>
> | The amount of shared memory used for WAL data that has not yet been
> written to disk.
> | The default setting of -1 selects a size equal to 1/32nd (about 3%) of
> shared_buffers, ...
>
> .. and other stuff:
>
> src/backend/storage/ipc/ipci.c
> * Size of the Postgres shared-memory block is estimated via
> * moderately-accurate estimates for the big hogs, plus 100K for
> the
> * stuff that's too small to bother with estimating.
> *
> * We take some care during this phase to ensure that the total
> size
> * request doesn't overflow size_t. If this gets through, we don't
> * need to be so careful during the actual allocation phase.
> */
> size = 100000;
> size = add_size(size, PGSemaphoreShmemSize(numSemas));
> size = add_size(size, SpinlockSemaSize());
> size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
>
> sizeof(ShmemIndexEnt)));
> size = add_size(size, dsm_estimate_size());
> size = add_size(size, BufferShmemSize());
> size = add_size(size, LockShmemSize());
> size = add_size(size, PredicateLockShmemSize());
> size = add_size(size, ProcGlobalShmemSize());
> size = add_size(size, XLOGShmemSize());
> size = add_size(size, CLOGShmemSize());
> size = add_size(size, CommitTsShmemSize());
> size = add_size(size, SUBTRANSShmemSize());
> size = add_size(size, TwoPhaseShmemSize());
> size = add_size(size, BackgroundWorkerShmemSize());
> size = add_size(size, MultiXactShmemSize());
> size = add_size(size, LWLockShmemSize());
> size = add_size(size, ProcArrayShmemSize());
> size = add_size(size, BackendStatusShmemSize());
> size = add_size(size, SInvalShmemSize());
> size = add_size(size, PMSignalShmemSize());
> size = add_size(size, ProcSignalShmemSize());
> size = add_size(size, CheckpointerShmemSize());
> size = add_size(size, AutoVacuumShmemSize());
> size = add_size(size, ReplicationSlotsShmemSize());
> size = add_size(size, ReplicationOriginShmemSize());
> size = add_size(size, WalSndShmemSize());
> size = add_size(size, WalRcvShmemSize());
> size = add_size(size, PgArchShmemSize());
> size = add_size(size, ApplyLauncherShmemSize());
> size = add_size(size, SnapMgrShmemSize());
> size = add_size(size, BTreeShmemSize());
> size = add_size(size, SyncScanShmemSize());
> size = add_size(size, AsyncShmemSize());
> #ifdef EXEC_BACKEND
> size = add_size(size, ShmemBackendArraySize());
> #endif
>
> /* freeze the addin request size and include it */
> addin_request_allowed = false;
> size = add_size(size, total_addin_request);
>
> /* might as well round it off to a multiple of a typical page size
> */
> size = add_size(size, 8192 - (size % 8192));
>
> BTW, I think it'd be nice if this were a NOTICE:
> | elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge pages disabled:
> %m", allocsize);
>

Great detail. I did some trial and error around just a few variables
(shared_buffers, wal_buffers, max_connections) and came up with a formula
that seems to be "good enough" for at least a rough default estimate.

The pseudo-code is basically:

ceiling((shared_buffers + 200 + (25 * shared_buffers/1024) +
10*(max_connections-100)/200 + wal_buffers-16)/2)

This assumes that all values are in MB and that wal_buffers is set to a
value other than the default of -1 obviously. I decided to default
wal_buffers to 16MB in our environments since that's what -1 should go to
based on the description in the documentation for an instance with
shared_buffers of the sizes in our deployments.

This formula did come up a little short (2MB) when I had a low
shared_buffers value at 2GB. Raising that starting 200 value to something
like 250 would take care of that. The limited testing I did based on
different values we see across our production deployments worked otherwise.
Please let me know what you folks think. I know I'm ignoring a lot of other
factors, especially given what Justin recently shared.

The remaining trick for me now is to calculate this in chef since
shared_buffers and wal_buffers attributes are strings with the unit ("MB")
in them, rather than just numerical values. Thinking of changing that
attribute to be just that and assume/require MB to make the calculations
easier.

--
Don Seiler
www.seiler.us

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message abbas alizadeh 2021-06-14 15:41:02 Kill postgresql process
Previous Message pramod kg 2021-06-14 08:47:41 Re: PostgreSQL SSL params

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-06-14 14:40:31 Re: Failure in subscription test 004_sync.pl
Previous Message Jehan-Guillaume de Rorthais 2021-06-14 14:10:32 Re: [Proposal] Add accumulated statistics for wait event