Load Distributed Checkpoints, take 3

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Load Distributed Checkpoints, take 3
Date: 2007-06-20 13:47:31
Message-ID: 46792FF3.8000301@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Here's an updated WIP patch for load distributed checkpoints.

I added a spinlock to protect the signaling fields between bgwriter and
backends. The current non-locking approach gets really difficult as the
patch adds two new flags, and both are more important than the existing
ckpt_time_warn flag.

In fact, I think there's a small race condition in CVS HEAD:

1. pg_start_backup() is called, which calls RequestCheckpoint
2. RequestCheckpoint takes note of the old value of ckpt_started
3. bgwriter wakes up from pg_usleep, and sees that we've exceeded
checkpoint_timeout.
4. bgwriter increases ckpt_started to note that a new checkpoint has started
5. RequestCheckpoint signals bgwriter to start a new checkpoint
6. bgwriter calls CreateCheckpoint, with the force-flag set to false
because this checkpoint was triggered by timeout
7. RequestCheckpoint sees that ckpt_started has increased, and starts to
wait for ckpt_done to reach the new value.
8. CreateCheckpoint finishes immediately, because there was no XLOG
activity since last checkpoint.
9. RequestCheckpoint sees that ckpt_done matches ckpt_started, and returns.
10. pg_start_backup() continues, with potentially the same redo location
and thus history filename as previous backup.

Now I admit that the chances for that to happen are extremely small,
people don't usually do two pg_start_backup calls without *any* WAL
logged activity in between them, for example. But as we add the new
flags, avoiding scenarios like that becomes harder.

Since last patch, I did some clean up and refactoring, and added a bunch
of comments, and user documentation.

I haven't yet changed GetInsertRecPtr to use the almost up-to-date value
protected by the info_lck per Simon's suggestion, and I need to do some
correctness testing. After that, I'm done with the patch.

Ps. In case you wonder what took me so long since last revision, I've
spent a lot of time reviewing HOT.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
ldc-justwrites-3.patch text/x-diff 53.8 KB

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2007-06-20 15:32:29 Re: [gpoo@ubiobio.cl: Re: [HACKERS] EXPLAIN omits schema?]
Previous Message Alvaro Herrera 2007-06-20 13:47:00 Re: more autovacuum fixes