| From: | bricklen <bricklen(at)gmail(dot)com> | 
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> | 
| Cc: | "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>, "pgsql-performance(at)postgresql(dot)org list" <pgsql-performance(at)postgresql(dot)org> | 
| Subject: | Re: [GENERAL] Questionaire: Common WAL write rates on busy servers. | 
| Date: | 2017-04-25 14:19:31 | 
| Message-ID: | CAGrpgQ-kdUawMakxoe-yd=3YKTGBgNhkd7DLYTUG4oMZevWD7g@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general pgsql-performance | 
On Mon, Apr 24, 2017 at 9:17 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Questions (answer as many you can comfortably answer):
> - How many MB/s, segments/s do you see on busier servers?
> - What generates the bulk of WAL on your servers (9.5+ can use
>   pg_xlogdump --stats to compute that)?
> - Are you seeing WAL writes being a bottleneck?OA
> - What kind of backup methods are you using and is the WAL volume a
>   problem?
> - What kind of replication are you using and is the WAL volume a
>   problem?
> - What are your settings for wal_compression, max_wal_size (9.5+) /
>   checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers?
> - Could you quickly describe your workload?
>
* Postgresql 9.3
* 1500+ db servers
* Daily peak for busy databases: 75 WALs switched per second (less than 10%
of the servers experience this)
* Avg per db: 2 WALs/s
* Mainly generated by large batch sync processes that occur throughout the
day, and by a legacy archiving process to purge older data (potentially
many millions of cascading deletes).
*Half the servers have (encrypted) pg_dump backups, WAL volume hasn't
proved to be a problem there, though dump size is a problem for a few of
the larger databases (less than 1TB).
* Inter-data-centre replication is all streaming, across DC's (over the
WAN) WAL shipping is over compressed SSH tunnels.
Occasionally the streaming replication falls behind, but more commonly it
is the cross-DC log shipping that becomes a problem. Some of the servers
will generate 50+ GBs of WAL in a matter of minutes and that backs up
immediately on the masters. Occasionally this has a knock-on effect for
other servers and slows down their log shipping due to network saturation.
* checkpoint_segments: 64, checkpoint_timeout: 5 mins, wal_buffers: 16MB
Workload:
70% of servers are generally quiet, with occasional bursty reads and writes.
20% are medium use, avg a few hundred transactions/second
10% average around 5k txns/s, with bursts up to 25k txns/s for several
minutes.
All servers have about 80% reads / 20% writes, though those numbers flip
during big sync jobs and when the purging maintenance kicks off.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | David G. Johnston | 2017-04-25 14:25:34 | Re: pg_dump: creates dumps that cannot be restored | 
| Previous Message | Thorsten Glaser | 2017-04-25 11:15:35 | Re: pg_dump: creates dumps that cannot be restored | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Claudio Freire | 2017-04-25 16:07:41 | Re: Questionaire: Common WAL write rates on busy servers. | 
| Previous Message | Vladimir Borodin | 2017-04-25 07:56:14 | Re: Questionaire: Common WAL write rates on busy servers. |