Re: proposal: Allocate work_mem From Pool

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Joseph D Wagner <joe(at)josephdwagner(dot)info>
Cc: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: proposal: Allocate work_mem From Pool
Date: 2022-07-13 22:23:42
Message-ID: 20220713222342.GE18011@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 12, 2022 at 08:49:10PM -0700, Joseph D Wagner wrote:
> > > Before I try to answer that, I need to know how the scheduler works.
>
> > As I understand the term used, there is no scheduler inside Postgres
> > for user connections -- they're handled by the OS kernel.
>
> Then, I'm probably using the wrong term. Right now, I have
> max_worker_processes set to 16. What happens when query #17
> wants some work done? What do you call the thing that handles
> that? What is its algorithm for allocating work to the processes?
> Or am I completely misunderstanding the role worker processes
> play in execution?

max_connections limits the number of client connections (queries).
Background workers are a relatively new thing - they didn't exist until v9.3.

There is no scheduler, unless you run a connection pooler between the
application and the DB. Which you should probably do, since you've set
work_mem measured in GB on a server with 10s of GB of RAM, and while using
partitioning.

https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-MAX-CONNECTIONS
|max_connections (integer)
|
| Determines the maximum number of concurrent connections to the database server. The default is typically 100 connections, but might be less if your kernel settings will not support it (as determined during initdb). This parameter can only be set at server start.

https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-MAX-WORKER-PROCESSES
|max_worker_processes (integer)
| Sets the maximum number of background processes that the system can support. This parameter can only be set at server start. The default is 8.
| When running a standby server, you must set this parameter to the same or higher value than on the primary server. Otherwise, queries will not be allowed in the standby server.
| When changing this value, consider also adjusting max_parallel_workers, max_parallel_maintenance_workers, and max_parallel_workers_per_gather.

https://www.postgresql.org/docs/current/connect-estab.html
|PostgreSQL implements a “process per user” client/server model. In this model, every client process connects to exactly one backend process. As we do not know ahead of time how many connections will be made, we have to use a “supervisor process” that spawns a new backend process every time a connection is requested. This supervisor process is called postmaster and listens at a specified TCP/IP port for incoming connections. Whenever it detects a request for a connection, it spawns a new backend process. Those backend processes communicate with each other and with other processes of the instance using semaphores and shared memory to ensure data integrity throughout concurrent data access.
|
|The client process can be any program that understands the PostgreSQL protocol described in Chapter 53. Many clients are based on the C-language library libpq, but several independent implementations of the protocol exist, such as the Java JDBC driver.
|
|Once a connection is established, the client process can send a query to the backend process it's connected to. The query is transmitted using plain text, i.e., there is no parsing done in the client. The backend process parses the query, creates an execution plan, executes the plan, and returns the retrieved rows to the client by transmitting them over the established connection.

https://www.postgresql.org/docs/current/tutorial-arch.html
| The PostgreSQL server can handle multiple concurrent connections from clients. To achieve this it starts (“forks”) a new process for each connection. From that point on, the client and the new server process communicate without intervention by the original postgres process. Thus, the supervisor server process is always running, waiting for client connections, whereas client and associated server processes come and go. (All of this is of course invisible to the user. We only mention it here for completeness.)

https://wiki.postgresql.org/wiki/FAQ#How_does_PostgreSQL_use_CPU_resources.3F
|The PostgreSQL server is process-based (not threaded). Each database session connects to a single PostgreSQL operating system (OS) process. Multiple sessions are automatically spread across all available CPUs by the OS. The OS also uses CPUs to handle disk I/O and run other non-database tasks. Client applications can use threads, each of which connects to a separate database process. Since version 9.6, portions of some queries can be run in parallel, in separate OS processes, allowing use of multiple CPU cores. Parallel queries are enabled by default in version 10 (max_parallel_workers_per_gather), with additional parallelism expected in future releases.

BTW, since this is amply documented, I have to point out that it's not on-topic
for the development list.

--
Justin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2022-07-13 23:03:48 Re: Refactor to make use of a common function for GetSubscriptionRelations and GetSubscriptionNotReadyRelations.
Previous Message Tom Lane 2022-07-13 21:25:02 Re: Bug: Reading from single byte character column type may cause out of bounds memory reads.