Re: Load Distributed Checkpoints, take 3

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Load Distributed Checkpoints, take 3
Date: 2007-06-21 08:32:52
Message-ID: 467A37B4.7030708@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Tom Lane wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
>> I added a spinlock to protect the signaling fields between bgwriter and
>> backends. The current non-locking approach gets really difficult as the
>> patch adds two new flags, and both are more important than the existing
>> ckpt_time_warn flag.
>
> That may be, but you could minimize the cost and notational ugliness by
> not using the spinlock where you don't have to. Put the sig_atomic_t
> fields back the way they were, and many of the uses of the spinlock go
> away. All you really need it for is the places where the additional
> flags are set or read.

I find it easier to understand if it's used whenever any of the fields
are accessed. You don't need to read and write ckpt_failed and
ckpt_started/ckpt_done in specific order anymore, for example.

> Some other comments:
>
> I tend to agree with whoever said upthread that the combination of GUC
> variables proposed here is confusing and ugly. It'd make more sense to
> have min and max checkpoint rates in KB/s, with the max checkpoint rate
> only honored when we are predicting we'll finish before the next
> checkpoint time.

Really? I thought everyone is happy with the current combination, and
that it was just the old trio of parameters controlling the write, nap
and sync phases that was ugly. One particularly nice thing about
defining the duration as a fraction of checkpoint interval is that we
can come up with a reasonable default value that doesn't depend on your
hardware.

How would a min and max rate work?

Anyone else have an opinion on the parameters?

> The flow of control and responsibility between xlog.c, bgwriter.c and
> bufmgr.c seems to have reached the breaking point of unintelligibility.
> Can you think of a refactoring that would simplify it? We might have
> to resign ourselves to this much messiness, but I'd rather not...

The problem we're trying to solve is doing a checkpoint while running
the normal bgwriter activity at the same time. The normal way to do two
things simultaneously is to have two different processes (or threads). I
thought about having a separate checkpoint process, but I gave up on
that thought because the communication needed between backends, bgwriter
and the checkpointer seems like a mess. The checkpointer would need the
pendingOpsTable so that it can do the fsyncs, and it would also need to
receive the forget-messages to that table. We could move that table
entirely to the checkpointer, but since bgwriter is presumably doing
most of the writes, there would be a lot of chatter between bgwriter and
the checkpointer.

The current approach is like co-operative multitasking. BufferSyncs
yields control to bgwriter every now and then.

The division of labor between xlog.c and other modules is not that bad,
IMO. CreateCheckPoint is the main entry point to create a checkpoint,
and it calls other modules to do their stuff, including bufmgr.c.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Heikki Linnakangas 2007-06-21 12:33:35 Re: Load Distributed Checkpoints, take 3
Previous Message Heikki Linnakangas 2007-06-21 08:14:54 Re: Load Distributed Checkpoints, take 3