Re: Syncrep and improving latency due to WAL throttling

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Syncrep and improving latency due to WAL throttling
Date: 2023-02-01 10:04:14
Message-ID: CAKZiRmwfdm-ECypgo4xgv11cTMZiSXrGiuoJpjOiqJyYvoTf2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 30, 2023 at 9:16 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

Hi Bharath, thanks for reviewing.

> I think measuring the number of WAL flushes with and without this
> feature that the postgres generates is great to know this feature
> effects on IOPS. Probably it's even better with variations in
> synchronous_commit_flush_wal_after or the throttling limits.

It's the same as point 19, so I covered it there.

> Although v2 is a WIP patch, I have some comments:
> 1. Coding style doesn't confirm to the Postgres standard:

Fixed all thoses that you mentioned and I've removed elog() and code
for timing. BTW: is there a way to pgindent only my changes (on git
diff?)

> 2. It'd be better to add a TAP test hitting the WAL throttling.

TODO, any hints on how that test should look like are welcome
(checking pg_stat_wal?) I've could spot only 1 place for it --
src/test/recovery/007_sync_rep.pl

> 3. We generally don't need timings to be calculated explicitly when we
> emit before and after log messages. If needed one can calculate the
> wait time from timestamps of the log messages. If it still needs an
> explicit time difference which I don't think is required, because we
> have a special event and before/after log messages, I think it needs
> to be under track_wal_io_timing to keep things simple.

Removed, it was just debugging aid to demonstrate the effect in the
session waiting.

> 4. XLogInsertRecord() can be a hot path for high-write workload, so
> effects of adding anything in it needs to be measured. So, it's better
> to run benchmarks with this feature enabled and disabled.

When the GUC is off(0 / default), in my tests the impact is none (it's
just set of simple IFs), however if the feature is enabled then the
INSERT is slowed down (I've repeated the initial test from 1st post
and single-statement INSERT for 50MB WAL result is the same 4-5s and
~23s +/- 1s when feature is activated when the RTT is 10.1ms, but
that's intentional).

> 5. Missing documentation of this feature and the GUC. I think we can
> describe extensively why we'd need a feature of this kind in the
> documentation for better adoption or usability.

TODO, I'm planning on adding documentation when we'll be a little bit
closer to adding to CF.

> 6. Shouldn't the max limit be MAX_KILOBYTES?
> + &synchronous_commit_flush_wal_after,
> + 0, 0, 1024L*1024L,

Fixed.

> 7. Can this GUC name be something like
> synchronous_commit_wal_throttle_threshold to better reflect what it
> does for a backend?
> + {"synchronous_commit_flush_wal_after", PGC_USERSET,
> REPLICATION_SENDING,

Done.

> 8. A typo - s/confiration/confirmation
[..]
> 9. This
> "Sets the maximum logged WAL in kbytes, after which wait for sync
> commit confiration even without commit "
> better be something like below?
> "Sets the maximum amount of WAL in kilobytes a backend generates after
> which it waits for synchronous commit confiration even without commit
> "

Fixed as you have suggested.

> 10. I think there's nothing wrong in adding some assertions in
> HandleXLogDelayPending():
> Assert(synchronous_commit_flush_wal_after > 0);
> Assert(backendWalInserted > synchronous_commit_flush_wal_after * 1024L);
> Assert(XactLastThrottledRecEnd is not InvalidXLogRecPtr);

Sure, added.

> 11. Specify the reason in the comments as to why we had to do the
> following things:
> Here:
> + /* flush only up to the last fully filled page */
> + XLogRecPtr LastFullyWrittenXLogPage = XactLastThrottledRecEnd
> - (XactLastThrottledRecEnd % XLOG_BLCKSZ);
> Talk about how this avoids multiple-flushes for the same page.
>
> Here:
> + * Called from ProcessMessageInterrupts() to avoid waiting while
> being in critical section
> + */
> +void HandleXLogDelayPending()
> Talk about how waiting in a critical section, that is in
> XLogInsertRecord() causes locks to be held longer durations and other
> effects.

Added.

> Here:
> + /* WAL throttling */
> + backendWalInserted += rechdr->xl_tot_len;
> Be a bit more verbose about why we try to throttle WAL and why only
> for sync replication, the advantages, effects etc.

Added.

> 12. This better be a DEBUG1 or 2 message instead of WARNINGs to not
> clutter server logs.
> + /* XXX Debug for now */
> + elog(WARNING, "throttling WAL down on this session
> (backendWalInserted=%d, LSN=%X/%X flushingTo=%X/%X)",
> + backendWalInserted,
> + LSN_FORMAT_ARGS(XactLastThrottledRecEnd),
> + LSN_FORMAT_ARGS(LastFullyWrittenXLogPage));
>
> + elog(WARNING, "throttling WAL down on this session - end (%f
> ms)", timediff);

OK, that's entirely removed.

> 13. BTW, we don't need to hold interrupts while waiting for sync
> replication ack as it may block query cancels or proc die pendings.
> You can remove these, unless there's a strong reason.
> + //HOLD_INTERRUPTS();
> + //RESUME_INTERRUPTS();

Sure, removed. However, one problem I've discovered is that we were
hitting Assert(InterruptHoldoffCount > 0) in SyncRepWaitForLSN, so
I've fixed that too.

> 14. Add this wait event in the documentation.
> + case WAIT_EVENT_SYNC_REP_THROTTLED:
> + event_name = "SyncRepThrottled";
> + break;
>
> 15. Why can XLogDelayPending not be a volatile atomic variable? Is it
> because it's not something being set within a signal handler?
> extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
> +extern PGDLLIMPORT bool XLogDelayPending;

Added a comment explaining that.

> 16.
> + instr_time wal_throttle_time_start, wal_throttle_time_end;
> + double timediff;
> + XLogDelayPending = false;
> An extra line needed after variable declaration and assignment.

Fixed.

> 17. I think adding how many times a backend throttled WAL to
> pg_stat_activity is a good metric.

Good idea, added; catversion and pgstat format id were bumped. I've
also added it to the per-query EXPLAIN (WAL) so it logs something like
"WAL: records=500000 bytes=529500000 throttled=2016" , however I would
appreciate a better name proposal on how to name that.

> 18. Can you avoid the need of new functions SyncRepWaitForLSNInternal
> and SyncRepWaitForLSNThrottled by relying on the global throttling
> state to determine the correct waitevent in SyncRepWaitForLSN?

Done.

> 19. I think measuring the number of WAL flushes with and without this
> feature that the postgres generates is great to know this feature
> effects on IOPS. Probably it's even better with variations in
> synchronous_commit_flush_wal_after or the throttling limits.

default =>
INSERT 0 500000
Time: 6996.340 ms (00:06.996)
-[ RECORD 1 ]----+------------------------------
wal_records | 500001
wal_fpi | 0
wal_bytes | 529500034
wal_buffers_full | 44036
wal_write | 44056
wal_sync | 40
wal_write_time | 317.991
wal_sync_time | 2690.425
wal_throttled | 0
stats_reset | 2023-02-01 10:31:44.475651+01
and 1 call to probe_postgres:XLogFlush

set synchronous_commit_wal_throttle_threshold to '256kB'; =>
INSERT 0 500000
Time: 25476.155 ms (00:25.476)
-[ RECORD 1 ]----+------------------------------
wal_records | 500001
wal_fpi | 0
wal_bytes | 529500034
wal_buffers_full | 0
wal_write | 2062
wal_sync | 2062
wal_write_time | 180.177
wal_sync_time | 1409.522
wal_throttled | 2016
stats_reset | 2023-02-01 10:32:01.955513+01
and 2017 calls to probe_postgres:XLogFlush

set synchronous_commit_wal_throttle_threshold to '1MB'; =>
INSERT 0 500000
Time: 10113.278 ms (00:10.113)
-[ RECORD 1 ]----+------------------------------
wal_records | 642864
wal_fpi | 0
wal_bytes | 537929315
wal_buffers_full | 0
wal_write | 560
wal_sync | 559
wal_write_time | 182.586
wal_sync_time | 988.72
wal_throttled | 504
stats_reset | 2023-02-01 10:32:36.250678+01

Maybe we should avoid calling fsyncs for WAL throttling? (by teaching
HandleXLogDelayPending()->XLogFlush()->XLogWrite() to NOT to sync when
we are flushing just because of WAL thortting ?) Would that still be
safe?

-J.

Attachment Content-Type Size
v3-0001-WIP-Syncrep-and-improving-latency-due-to-WAL-thro.patch application/octet-stream 19.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2023-02-01 10:05:32 Re: MacOS: xsltproc fails with "warning: failed to load external entity"
Previous Message shiy.fnst@fujitsu.com 2023-02-01 09:40:41 RE: Time delayed LR (WAS Re: logical replication restrictions)