Re: Spread checkpoint sync

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Spread checkpoint sync
Date: 2011-01-17 01:42:13
Message-ID: AANLkTimZR4qEmao7m1i+FAvv-z41ZC1nK0oUzxnRuA3G@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 16, 2011 at 7:32 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> But since you already wrote a patch to do the whole thing, I figured
> I'd time it.

Thanks!

> I arranged to test an instrumented version of your patch under large
> shared_buffers of 4GB, conditions that would maximize the opportunity
> for it to take a long time.  Running your compaction to go from 524288
> to a handful (14 to 29, depending on run) took between 36 and 39
> milliseconds.
>
> For comparison, doing just the memcpy part of AbsorbFsyncRequest on
> a full queue took from 24 to 27 milliseconds.
>
> They are close enough to each other that I am no longer interested in
> partial deduplication.  But both are long enough that I wonder if
> having a hash table in shared memory that is kept unique automatically
> at each update might not be worthwhile.

There are basically three operations that we care about here: (1) time
to add an fsync request to the queue, (2) time to absorb requests from
the queue, and (3) time to compact the queue. The first is by far the
most common, and at least in any situation that anyone's analyzed so
far, the second will be far more common than the third. Therefore, it
seems unwise to accept any slowdown in #1 to speed up either #2 or #3,
and a hash table probe is definitely going to be slower than what's
required to add an element under the status quo.

We could perhaps mitigate this by partitioning the hash table.
Alternatively, we could split the queue in half and maintain a global
variable - protected by the same lock - indicating which half is
currently open for insertions. The background writer would grab the
lock, flip the global, release the lock, and then drain the half not
currently open to insertions; the next iteration would flush the other
half. However, it's unclear to me that either of these things has any
value. I can't remember any reports of contention on the
BgWriterCommLock, so it seems like changing the logic as minimally as
possible as the way to go.

(In contrast, note that the WAL insert lock, proc array lock, and lock
manager/buffer manager partition locks are all known to be heavily
contended in certain workloads.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-01-17 01:46:57 Re: limiting hint bit I/O
Previous Message Kevin Grittner 2011-01-17 01:41:53 Re: limiting hint bit I/O