Re: Questionaire: Common WAL write rates on busy servers.

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-general(at)postgresql(dot)org, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Questionaire: Common WAL write rates on busy servers.
Date: 2017-04-25 16:07:41
Message-ID: CAGTBQpbkKjgxEc_6FW_eUd-ny4KqKJVb1zSFmJ652zeehOMxPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

On Tue, Apr 25, 2017 at 1:17 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Questions (answer as many you can comfortably answer):
> - How many MB/s, segments/s do you see on busier servers?

~20MB/s with FPW compression, with peaks of ~35MB/s. Writes become the
bottleneck without compression and it tops at about 40-50MB/s, WAL
archiving cannot keep up beyond that point.

> - What generates the bulk of WAL on your servers (9.5+ can use
> pg_xlogdump --stats to compute that)?

Type N (%)
Record size (%) FPI size (%) Combined
size (%)
---- - ---
----------- --- -------- ---
------------- ---
XLOG 0 ( 0.00)
0 ( 0.00) 0 ( 0.00) 0 (
0.00)
Transaction 30 ( 0.00)
960 ( 0.00) 0 ( 0.00) 960 (
0.00)
Storage 0 ( 0.00)
0 ( 0.00) 0 ( 0.00) 0 (
0.00)
CLOG 0 ( 0.00)
0 ( 0.00) 0 ( 0.00) 0 (
0.00)
Database 0 ( 0.00)
0 ( 0.00) 0 ( 0.00) 0 (
0.00)
Tablespace 0 ( 0.00)
0 ( 0.00) 0 ( 0.00) 0 (
0.00)
MultiXact 110 ( 0.01)
7456 ( 0.02) 0 ( 0.00) 7456 (
0.00)
RelMap 0 ( 0.00)
0 ( 0.00) 0 ( 0.00) 0 (
0.00)
Standby 2 ( 0.00)
368 ( 0.00) 0 ( 0.00) 368 (
0.00)
Heap2 2521 ( 0.22)
78752 ( 0.24) 4656133 ( 2.82) 4734885 (
2.39)
Heap 539419 ( 46.52)
15646903 ( 47.14) 98720258 ( 59.87) 114367161 (
57.73)
Btree 606573 ( 52.31)
15872182 ( 47.82) 57514798 ( 34.88) 73386980 (
37.05)
Hash 0 ( 0.00)
0 ( 0.00) 0 ( 0.00) 0 (
0.00)
Gin 2866 ( 0.25)
134330 ( 0.40) 4012251 ( 2.43) 4146581 (
2.09)
Gist 0 ( 0.00)
0 ( 0.00) 0 ( 0.00) 0 (
0.00)
Sequence 7970 ( 0.69)
1450540 ( 4.37) 0 ( 0.00) 1450540 (
0.73)
SPGist 0 ( 0.00)
0 ( 0.00) 0 ( 0.00) 0 (
0.00)
BRIN 0 ( 0.00)
0 ( 0.00) 0 ( 0.00) 0 (
0.00)
CommitTs 0 ( 0.00)
0 ( 0.00) 0 ( 0.00) 0 (
0.00)
ReplicationOrigin 0 ( 0.00)
0 ( 0.00) 0 ( 0.00) 0 (
0.00)
--------
-------- -------- --------
Total 1159491
33191491 [16.76%] 164903440 [83.24%] 198094931
[100%]

> - Are you seeing WAL writes being a bottleneck?OA

Sometimes, more so without FPW compression

> - What kind of backup methods are you using and is the WAL volume a
> problem?
> - What kind of replication are you using and is the WAL volume a
> problem?

Streaming to hot standby + WAL archiving, delayed standby as backup
and PITR. Backups are regular filesystem-level snapshots of the
delayed standby (with postgres down to get consistent snapshots).

WAL volume getting full during periods where the hot standby lags
behind (or when we have to stop it to create consistent snapshots) are
an issue indeed, and we've had to provision significant storage to be
able to absorb those peaks (1TB of WAL)

We bundle WAL segments into groups of 256 segments for archiving and
recovery to minimize the impact of TCP slow start. We further gzip
segments before transfer with pigz, and we use mostly rsync (with a
wrapper script that takes care of durability and error handling) to
move segments around. Getting the archive/recovery scripts to handle
the load hasn't been trivial.

> - What are your settings for wal_compression, max_wal_size (9.5+) /
> checkpoint_segments (< 9.5), checkpoint_timeout and wal_buffers?

wal_compression = on
max_wal_size = 12GB
min_wal_size = 2GB
checkpoint_timeout = 30min
wal_buffers = -1 (16MB effective)

> - Could you quickly describe your workload?

Steady stream of (preaggregated) input events plus upserts into ~12
partitioned aggregate "matviews" (within quotes since they're manually
maintained up to date).

Input rate is approximately 9000 rows/s without counting the upserts
onto the aggregate matviews. Old information is regularly compressed
and archived into less detailed partitions for a steady size of about
5TB.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Mike Blackwell 2017-04-25 16:26:53 Surprising results from array concatenation
Previous Message kerneltrick 2017-04-25 15:39:39 FDW table doesn't exist

Browse pgsql-performance by date

  From Date Subject
Next Message Jerry Sievers 2017-04-25 18:56:28 Re: Questionaire: Common WAL write rates on busy servers.
Previous Message bricklen 2017-04-25 14:19:31 Re: [GENERAL] Questionaire: Common WAL write rates on busy servers.