| From: | Shirisha Shirisha <shirisha(dot)sn(at)broadcom(dot)com> |
|---|---|
| To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Cc: | soumyadeep2007(at)gmail(dot)com, Ashwin Agrawal <ashwinstar(at)gmail(dot)com> |
| Subject: | Redux: Throttle WAL inserts before commit |
| Date: | 2024-08-27 10:50:40 |
| Message-ID: | CAP3-t08umaBEUEppzBVY6==3tbdLwG7b4wfrba73zfOAUrRsoQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello hackers,
This is an attempt to resurrect the thread [1] to throttle WAL inserts
before the point of commit.
Background:
Transactions on commit, wait for replication and make sure WAL is
flushed up to commit lsn on standby, when synchronous_commit is on.
While commit is a mandatory sync/wait point, waiting for replication at
some periodic intervals en route may be desirable/efficient to act as
good citizen. Consider for example, a setup where primary and standby
can write at 20GB/sec, while network between them can only transfer at
2GB/sec. Now if CTAS is run in such a setup for a large table, it can
generate WAL very aggressively on primary, but can't be transferred at
that rate to standby. Hence, there would be pending WAL build-up on
primary. This exhibits two main things:
- Fairness: new write transactions (even if single tuple I/U/D), and
even read transactions (setting hint bits) would exhibit latency for
amount of time equivalent to the pending WAL to be shipped and
flushed to standby.
- Primary needs to have space to hold that much WAL, since till the WAL
is not shipped to standby, it can't be recycled, if replication slots
are in use.
Proposed solution (patch attached):
- Global (backend local) variable wal_bytes_written to track the amount
of wal written by the backend since the start of transaction or the
last time SyncReplWaitForLSN() was called for this transaction.
- Whenever we find wal_bytes_written exceeds the new
wait_for_replication_threshold GUC, we set the control flag
XlogThrottlePending (similar in spirit to LogMemoryContextPending),
which is then handled at ProcessInterrupts() time. This is the
mechanism proposed in [2]. Doing it this way avoids issues such as
holding locks inside a critical section.
- To do the wait itself, we rely on SyncRepWaitForLSN(), with the cached
value of the WAL flush point.
[1] https://www.postgresql.org/message-id/flat/CAHg%2BQDcO_zhgBCMn5SosvhuuCoJ1vKmLjnVuqUEOd4S73B1urw%40mail.gmail.com
[2] https://www.postgresql.org/message-id/20220105174643.lozdd3radxv4tlmx%40alap3.anarazel.de
Regards,
Shirisha
Broadcom Inc.
--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-WAL-throttling-mechanism-for-synchronous-replicat.patch | application/octet-stream | 12.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | David Rowley | 2024-08-27 11:03:00 | Re: Significant Execution Time Difference Between PG13.14 and PG16.4 for Query on information_schema Tables. |
| Previous Message | Tomas Vondra | 2024-08-27 10:38:48 | Re: PoC: prefetching data between executor nodes (e.g. nestloop + indexscan) |