From: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> |
---|---|
To: | Patches <pgsql-patches(at)postgresql(dot)org> |
Subject: | Load Distributed Checkpoints, take 3 |
Date: | 2007-06-20 13:47:31 |
Message-ID: | 46792FF3.8000301@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-patches |
Here's an updated WIP patch for load distributed checkpoints.
I added a spinlock to protect the signaling fields between bgwriter and
backends. The current non-locking approach gets really difficult as the
patch adds two new flags, and both are more important than the existing
ckpt_time_warn flag.
In fact, I think there's a small race condition in CVS HEAD:
1. pg_start_backup() is called, which calls RequestCheckpoint
2. RequestCheckpoint takes note of the old value of ckpt_started
3. bgwriter wakes up from pg_usleep, and sees that we've exceeded
checkpoint_timeout.
4. bgwriter increases ckpt_started to note that a new checkpoint has started
5. RequestCheckpoint signals bgwriter to start a new checkpoint
6. bgwriter calls CreateCheckpoint, with the force-flag set to false
because this checkpoint was triggered by timeout
7. RequestCheckpoint sees that ckpt_started has increased, and starts to
wait for ckpt_done to reach the new value.
8. CreateCheckpoint finishes immediately, because there was no XLOG
activity since last checkpoint.
9. RequestCheckpoint sees that ckpt_done matches ckpt_started, and returns.
10. pg_start_backup() continues, with potentially the same redo location
and thus history filename as previous backup.
Now I admit that the chances for that to happen are extremely small,
people don't usually do two pg_start_backup calls without *any* WAL
logged activity in between them, for example. But as we add the new
flags, avoiding scenarios like that becomes harder.
Since last patch, I did some clean up and refactoring, and added a bunch
of comments, and user documentation.
I haven't yet changed GetInsertRecPtr to use the almost up-to-date value
protected by the info_lck per Simon's suggestion, and I need to do some
correctness testing. After that, I'm done with the patch.
Ps. In case you wonder what took me so long since last revision, I've
spent a lot of time reviewing HOT.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
ldc-justwrites-3.patch | text/x-diff | 53.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-06-20 15:32:29 | Re: [gpoo@ubiobio.cl: Re: [HACKERS] EXPLAIN omits schema?] |
Previous Message | Alvaro Herrera | 2007-06-20 13:47:00 | Re: more autovacuum fixes |