Vacuum to prevent wraparound data loss

From: Natalie Wenz <nataliewenz(at)ebureau(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Vacuum to prevent wraparound data loss
Date: 2012-03-09 00:50:57
Message-ID: 0008A0CD-58B3-4FC8-9976-E5D2B41E5E82@ebureau.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi!

I have a few questions regarding vacuum behavior. But first, some background:

We're running Postgres version 9.1.2 on FreeBSD 8.2 stable.

We did a large data-only single table dump (table was 12TB when we dumped it) and restored it on a new machine while our database was live in the new location. It took over 2 weeks for this bulk table load to finish, and since it took so long, and was also accepting inserts at the same time, we ran into some trouble with the transaction ids.

Here is the summary of events:

2/3 pg_restore started
2/6 autovacuum: VACUUM public.bigtable (to prevent wraparound)
2/17 noticed this in the logfile (may have started complaining earlier): WARNING: oldest xmin is far in the past
disabled external connections
2/21 restore finished
reenabled external connections-left auto vacuum to finish
WARNING: database "postgres" must be vacuumed within 1000001 transactions
ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres"
disabled external connections
… 3/8 vacuum is still running. The cpu usage for that pid fluctuates between 3% and 100%. We see it doing some reads and some writes. (Is there any way to know what it's doing?) The logfile continues to be spammed with: "ERROR: database is not accepting commands to avoid wraparound data loss in database "postgres" HINT: Stop the postmaster and use a standalone backend to vacuum that database."

My first question is: Is it even possible for this vacuum to finish? It began before the database stopped accepting connections, but now that it's in this state, will the database allow the vacuum to complete? Can the vacuum successfully freeze old xids and advance the datfrozenxid? I suspect no, based on

"(A manual VACUUM should fix the problem, as suggested by the hint; but note that the VACUUM must be performed by a superuser, else it will fail to process system catalogs and thus not be able to advance the database's datfrozenxid.) If these warnings are ignored, the system will shut down and refuse to execute any new transactions once there are fewer than 1 million transactions left until wraparound."

Can someone confirm? (And perhaps, explain why? And would a manual vacuum executed from within the database work? Or only from the standalone backend as the hint suggests?)

If it is indeed the case that the autovacuum that is currently running will be unsuccessful, then my next question is: Vacuum or vacuum freeze? Would one of them run more quickly than the other? (Significantly?) Is it possible/recommended to give the vacuum (or vacuum freeze) more resources when using the standalone backend? Our current tuning parameters and other possibly relevant information below.

Any help or insight is greatly appreciated!

Thanks,
Natalie

24GB of memory.
The database shows the "bigtable" size is 14TB, and the other table is about 250GB. Disk space is 25TB total, 18TB used.

system tuning:

kern.ipc.semmni=87381 //Number of semaphore identifiers
kern.ipc.semmns=87381 //Maximum number of semaphores in the system
kern.ipc.semmnu=87381 //Maximum number of undo structures in the system
kern.ipc.shmmni=1024 //Number of shared memory identifiers
kern.maxfiles=450000

/etc/sysctl.conf
kern.ipc.shmmax=1073741824 //Maximum shared memory segment size
kern.ipc.shmmin=1 //Minimum shared memory segment size
kern.ipc.shmseg=128 //Number of segments per process
kern.ipc.shmall=524288 //bumped up from 262144 //Maximum number of pages available for shared memory

postgres.conf--tuning parameters (changed from defaults or uncommented):

max_connections = 500
shared_buffers = 128MB
temp_buffers = 64MB
work_mem = 128MB
maintenance_work_mem = 1024MB
synchronous_commit = off
seq_page_cost = 1.0
random_page_cost = 1.0
effective_cache_size = 12GB
vacuum_cost_limit = 500
wal_buffers = 16MB
checkpoint_segments = 128
autovacuum = on
log_autovacuum_min_duration = 10000
autovacuum_max_workers = 3
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.1
autovacuum_freeze_max_age = 800000000
autovacuum_vacuum_cost_delay = 0ms

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2012-03-09 05:19:34 Re: Vacuum to prevent wraparound data loss
Previous Message Brian Fehrle 2012-03-08 22:16:01 Re: Hot standby having high requested checkpoints?