Re: measuring lwlock-related latency spikes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: measuring lwlock-related latency spikes
Date: 2012-04-01 01:29:43
Message-ID: CA+Tgmob3PbqQ8x6ib49SVX-uYiapTjnZ+rQ4az-akMqXmE1V0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 31, 2012 at 6:01 PM, Greg Stark <stark(at)mit(dot)edu> wrote:
> On Sat, Mar 31, 2012 at 10:14 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> Isn't that lock held while doing visibility checks?
>>
>> Nope.  heap_update() and friends do a very complicated little dance to
>> avoid that.
> ...
>>> What about I/O
>>> waiting for a clog page to be read?
>>
>> I'm pretty sure that can happen
>
> I'm confused because i thought these two sentences were part of
> describing the same case.

Oh, I thought you were talking about the visibility *map*. Sorry.

>> because TransactionIdIsCommitted()
>> can get called from HeapTupleSatisfies*() which pretty much only gets
>> called while holding the page lock.  I don't know whether it's the
>> cause of these particular stalls, but it's plausible if the CLOG cache
>> is getting thrashed hard enough.
>
> I wonder if it would make sense to, if we come across an xid that
> isn't in the slru release the lock while we read in the clog page.
> When we reobtain it we can check if the LSN has changed and if it has
> restart the visibility checks. If it hasn't pick up where we left off.

I've discovered a bug in my code that was causing it to print at most
2 histogram buckets per lwlock, which obviously means that my previous
results were totally inaccurate. Ah, the joys of benchmarking. I
found the problem when I added code to log a message any time an
lwlock wait exceeded a certain time threshold, and it fired far more
often than the previous results would have indicated. In particular,
it turns out that long waits for WALInsertLock are extremely common
and not, as the previous results appeared to indicated, unheard-of.
I'm rerunning my tests now and will post the updated,
hopefully-accurate results when that's done.

/me attempts to remove egg from face.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-04-01 01:46:02 Re: Command Triggers patch v18
Previous Message Robert Haas 2012-04-01 00:52:10 Re: new group commit behavior not helping?