We're using PostgreSQL 8.1.0 on AIX 5.3 through NFS (a Netapp Filer hosts the database files), and we're encoutering somes issues with vaccums. PostgreSQL binaries are built with xlc 6 (C for AIX Compiler 22.214.171.124) on AIX 5.2 (yes, I know, building on 5.2 and running on 5.3 is not the best way to avoid bugs...).
We have strong performance constraints with this database, so we planned vacuums with a crontab :
- Frequent vacuum analyze on some heavily-updated tables (few rows, but a lot of insert/update/delete). The frequency varies between 5 and 15 minutes.
- A nightly (not FULL) vacuum on the entire database.
We don't use autovacuum or FULL vacuum, because the high havailability needed for the database. We prefer to keep it under control.
Since some weeks, the amount of data hosted by the database grows, and, some nights, the database vacuum seems to "freeze" during his execution. In verbose mode, the logs show that the vacuum clean up a table (not always the same table), and... no more news. The system shows a vacuum process, which seems to be sleeping (no CPU used, no memory consumption...). In addition, the logs of our application show that database transactions seems to be slower.
For some internal reasons, the only way for us to workaround this problem is to shutdown of the application and the database. After a full restart, things are ok.
Some questions :
1) During the nightly database vacuum, some vacuums run concurrently (vacuums on heavily-updated tables). Can this concurrency cause some deadlocks ? We're planning to patch our shell scripts to avoid this concurrency.
2) I believed that the poor performances during the vacuum freeze were due to the obsolete data statistics. But after a full restart of the dabatase, performances are good. Does PostgreSQL rebuild his statistics during startup ?
3) Can we explain the freeze with a bad database configuration ? For instance, postgreSQL running out of connections, or whatever, causing the vacuum process to wait for free ressources ?
4) This morning, just before the database vacuum freeze, the logs show this error :
<2007-06-13 03:20:35 DFT%>ERROR: could not open relation 16391/16394/107937: A system call received an interrupt.
<2007-06-13 03:20:35 DFT%>CONTEXT: writing block 2 of relation 16391/16394/107937
<2007-06-13 03:20:40 DFT%>LOG: could not fsync segment 0 of relation 16392/16394/107925: A system call received an interrupt.
<2007-06-13 03:20:40 DFT%>ERROR: storage sync failed on magnetic disk: A system call received an interrupt.
This is the first time we're encountering this error. Can it be a cause of the vacuum freeze ?
Capgemini Telecom & Media / ITDR
tel : 02 99 27 82 30
e-mail : loic(dot)restoux(at)capgemini(dot)com
This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
pgsql-performance by date
|Next:||From: Markus Schiltknecht||Date: 2007-06-13 16:43:35|
|Subject: Re: dbt2 NOTPM numbers|
|Previous:||From: Mark Wong||Date: 2007-06-13 16:33:22|
|Subject: Re: dbt2 NOTPM numbers|