Skip site navigation (1) Skip section navigation (2)

Re: Load Distributed Checkpoints, take 3

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Load Distributed Checkpoints, take 3
Date: 2007-06-21 08:32:52
Message-ID: 467A37B4.7030708@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-patches
Tom Lane wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
>> I added a spinlock to protect the signaling fields between bgwriter and 
>> backends. The current non-locking approach gets really difficult as the 
>> patch adds two new flags, and both are more important than the existing 
>> ckpt_time_warn flag.
> 
> That may be, but you could minimize the cost and notational ugliness by
> not using the spinlock where you don't have to.  Put the sig_atomic_t
> fields back the way they were, and many of the uses of the spinlock go
> away.  All you really need it for is the places where the additional
> flags are set or read.

I find it easier to understand if it's used whenever any of the fields 
are accessed. You don't need to read and write ckpt_failed and 
ckpt_started/ckpt_done in specific order anymore, for example.

> Some other comments:
> 
> I tend to agree with whoever said upthread that the combination of GUC
> variables proposed here is confusing and ugly.  It'd make more sense to
> have min and max checkpoint rates in KB/s, with the max checkpoint rate
> only honored when we are predicting we'll finish before the next
> checkpoint time.

Really? I thought everyone is happy with the current combination, and 
that it was just the old trio of parameters controlling the write, nap 
and sync phases that was ugly. One particularly nice thing about 
defining the duration as a fraction of checkpoint interval is that we 
can come up with a reasonable default value that doesn't depend on your 
hardware.

How would a min and max rate work?

Anyone else have an opinion on the parameters?

> The flow of control and responsibility between xlog.c, bgwriter.c and
> bufmgr.c seems to have reached the breaking point of unintelligibility.
> Can you think of a refactoring that would simplify it?  We might have
> to resign ourselves to this much messiness, but I'd rather not...

The problem we're trying to solve is doing a checkpoint while running 
the normal bgwriter activity at the same time. The normal way to do two 
things simultaneously is to have two different processes (or threads). I 
thought about having a separate checkpoint process, but I gave up on 
that thought because the communication needed between backends, bgwriter 
and the checkpointer seems like a mess. The checkpointer would need the 
pendingOpsTable so that it can do the fsyncs, and it would also need to 
receive the forget-messages to that table. We could move that table 
entirely to the checkpointer, but since bgwriter is presumably doing 
most of the writes, there would be a lot of chatter between bgwriter and 
the checkpointer.

The current approach is like co-operative multitasking. BufferSyncs 
yields control to bgwriter every now and then.

The division of labor between xlog.c and other modules is not that bad, 
IMO. CreateCheckPoint is the main entry point to create a checkpoint, 
and it calls other modules to do their stuff, including bufmgr.c.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

In response to

Responses

pgsql-patches by date

Next:From: Heikki LinnakangasDate: 2007-06-21 12:33:35
Subject: Re: Load Distributed Checkpoints, take 3
Previous:From: Heikki LinnakangasDate: 2007-06-21 08:14:54
Subject: Re: Load Distributed Checkpoints, take 3

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group