Multixact members limit exceeded

From: Peter Hunčár <hunci(at)hunci(dot)sk>
To: pgsql-general(at)postgresql(dot)org
Subject: Multixact members limit exceeded
Date: 2017-08-09 10:06:48
Message-ID: CADOjABkRdgK6wPRpv7T4CWEJLDLXD_RBnUYZbfqmhKQxw9rNBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello

We have a fairly large postgresql-9.4 database (currently 70TB and growing
to approx 80TB ) running on Centos 7.
The HW is 48 core Xeon with 180GB of RAM with data on a enterprise grade
SAN storage.

We started feeding it several weeks ago and everything went smoothly until
we hit this issue:

2017-08-09 05:21:50.946 WIB >DETAIL: This command would create a multixact
with 2 members, but the remaining space is only enough for 0 members.
2017-08-09 05:21:50.946 WIB >HINT: Execute a database-wide VACUUM in
database with OID 20101 with reduced vacuum_multixact_freeze_min_age and
vacuum_multixact_freeze_table_age settings.

I did what the hint proposes, but on a such large database a vacuum takes
several days.
We are currently in data loading phase and we are sending only INSERT
statements, there should be very little UPDATEs or DELETEs.
Yesterday, the message disappeared shortly, but today it's back (vacuum is
still running)

*Is there a way how to prevent/fix this so we can finish the loading (97%
done), because the performance went down from 100 ops/sec to 15ops/min.*

Most tables have around 150 M rows with toast data.
There are several huge tables with toast data, currently autovacuumed, I
guess this is the reason for the performance drop:

| usename | application_name | state | backend_xmin |
query
-+----------+------------------+--------+--------------+----------------------------------------------------------------------
| postgres | | active | 1683428686 | autovacuum: VACUUM
pg_toast.pg_toast_14548803 (to prevent wraparound)
| postgres | | active | 1683428693 | autovacuum: VACUUM
pg_toast.pg_toast_14548821 (to prevent wraparound)
| postgres | | active | 1683428705 | autovacuum: VACUUM
pg_toast.pg_toast_14548828 (to prevent wraparound)
| postgres | | active | 1683428719 | autovacuum: VACUUM
pg_toast.pg_toast_14548835 (to prevent wraparound)
| postgres | | active | 1683428732 | autovacuum: VACUUM
pg_toast.pg_toast_14549150 (to prevent wraparound)

After changing the vacuum_cost_limit to 10000 for one night, I saw 200MB/s
of writes the whole night, but I had to change it back to 2000, because the
insert perf went to 0.

The autovacuumed tables are mostly toast tables of those:

SELECT relname, age(relminmxid) as mxid_age,
pg_size_pretty(pg_table_size(oid)) as table_size
FROM pg_class
WHERE relkind = 'r' and pg_table_size(oid) > 100000000
ORDER BY age(relminmxid) DESC LIMIT 20;

relname | mxid_age | table_size
------------------------------+------------+------------
t1 | 1554084722 | 172 GB
t2 | 1554084722 | 10 TB
t3 | 1554084722 | 21 GB
t4 | 1554084722 | 8928 MB
t5 | 1554084722 | 43 GB
t6 | 1554084722 | 2788 GB
t7 | 1554084722 | 290 GB
t8 | 1554084722 | 11 GB
t9 | 1554084722 | 2795 GB
t10 | 1554084722 | 20 GB
t11 | 1554084722 | 16 TB

There is a large number of member files in pg_multixact/members

-bash-4.2$ ll ../data/pg_multixact/members/|wc -l
82041
With the oldes file from the beginning of July.

pg_controldata output:

-bash-4.2$ pg_controldata /var/lib/pgsql/9.4/data/
pg_control version number: 942
Catalog version number: 201409291
Database system identifier: 6421090142329829830
Database cluster state: in production
pg_control last modified: Wed 09 Aug 2017 03:48:47 PM WIB
Latest checkpoint location: 589E/9DB366A0
Prior checkpoint location: 589E/95E8DEE0
Latest checkpoint's REDO location: 589E/963FC1B8
Latest checkpoint's REDO WAL file: 000000010000589E00000096
Latest checkpoint's TimeLineID: 1
Latest checkpoint's PrevTimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0/1683422006
Latest checkpoint's NextOID: 2293738628
Latest checkpoint's NextMultiXactId: 129346573
Latest checkpoint's NextMultiOffset: 4294967295
Latest checkpoint's oldestXID: 784247712
Latest checkpoint's oldestXID's DB: 20101
Latest checkpoint's oldestActiveXID: 0
Latest checkpoint's oldestMultiXid: 11604
Latest checkpoint's oldestMulti's DB: 20101
Time of latest checkpoint: Wed 09 Aug 2017 03:34:33 PM WIB
Fake LSN counter for unlogged rels: 0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline: 0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: minimal
Current wal_log_hints setting: off
Current max_connections setting: 1200
Current max_worker_processes setting: 8
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 64
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
Data page checksum version: 0

Thank you

Peter Huncar

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Юрий Нелепко 2017-08-09 10:47:17 pg_upgrade fails right after printing "Running in verbose mode"
Previous Message David G. Johnston 2017-08-09 01:44:52 Re: Any thoughts on making a psql meta-command "hide (or show alt text) if no results"?