| From: | "Simon Riggs" <simon(at)2ndquadrant(dot)com> | 
|---|---|
| To: | "Bruce Momjian" <bruce(at)momjian(dot)us> | 
| Cc: | "Jeremy Haile" <jhaile(at)fastmail(dot)fm>, <pgsql-performance(at)postgresql(dot)org> | 
| Subject: | Re: URGENT: Out of disk space pg_xlog | 
| Date: | 2006-12-29 18:18:18 | 
| Message-ID: | 1167416298.3903.230.camel@silverbirch.site | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-performance | 
On Sat, 2006-12-23 at 13:13 -0500, Bruce Momjian wrote:
> The bottom line is that we know of now cases where a long-running
> transaction would delay recycling of the WAL files, so there is
> certainly something not understood here.
We can see from all of this that a checkpoint definitely didn't occur.
Tom's causal chain was just one way that could have happened, there
could well be others.
I've noticed previously that a checkpoint can be starved out when trying
to acquire the CheckpointStartLock. I've witnessed a two minute delay
plus in obtaining the lock in the face of heavy transactions.
If wal_buffers is small enough, WAL write rate high enough and the
transaction rate high enough, a long queue can form for the
WALWriteLock, which ensures that the CheckpointStartLock would queue
indefinitely. 
I've tried implementing a queueable shared lock for the
CheckpointStartLock. That helps the checkpoint, but it harms performance
of other transactions waiting to commit, so I let that idea go.
-- 
  Simon Riggs             
  EnterpriseDB   http://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Russell Smith | 2006-12-29 21:25:12 | Re: Backup/Restore too slow | 
| Previous Message | Sebastián Baioni | 2006-12-29 18:03:55 | Re: Backup/Restore too slow |