Re: measuring lwlock-related latency spikes

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: measuring lwlock-related latency spikes
Date: 2012-03-31 08:53:51
Message-ID: CA+U5nMJBsGKKo6VmKqKy4gzHkU-XcGs6n_vt7t094xJCucLHZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 31, 2012 at 4:41 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> which means, if I'm not
> confused here, that every single lwlock-related stall > 1s happened
> while waiting for a buffer content lock.  Moreover, each event
> affected a different buffer.  I find this result so surprising that I
> have a hard time believing that I haven't screwed something up, so if
> anybody can check over the patch and this analysis and suggest what
> that thing might be, I would appreciate it.

Possible candidates are

1) pages on the RHS of the PK index on accounts. When the page splits
a new buffer will be allocated and the contention will move to the new
buffer. Given so few stalls, I'd say this was the block one above leaf
level.

2) Buffer writes hold the content lock in shared mode, so a delayed
I/O during checkpoint on a page requested by another for write would
show up as a wait for a content lock. That might happen to updates
where checkpoint write occurs between the search and write portions of
the update.

The next logical step in measuring lock waits is to track the reason
for the lock wait, not just the lock wait itself.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dobes Vandermeer 2012-03-31 10:27:26 Http Frontend implemented using pgsql?
Previous Message Dean Rasheed 2012-03-31 08:28:38 Tab completion of double quoted identifiers broken