Re: CPU spikes and transactions

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Dave Owens <dave(at)teamunify(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Julien Cigar <jcigar(at)ulb(dot)ac(dot)be>, Tony Kay <tony(at)teamunify(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: CPU spikes and transactions
Date: 2014-05-14 16:28:30
Message-ID: CAMkU=1xdDOEfGCAOX09DBh3n5r_=j+S9mpqrJnV02x3RZkUdBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, May 13, 2014 at 4:04 PM, Dave Owens <dave(at)teamunify(dot)com> wrote:

> Hi,
>
> Apologies for resurrecting this old thread, but it seems like this is
> better than starting a new conversation.
>
> We are now running 9.1.13 and have doubled the CPU and memory. So 2x 16
> Opteron 6276 (32 cores total), and 64GB memory. shared_buffers set to 20G,
> effective_cache_size set to 40GB.
>
> We were able to record perf data during the latest incident of high CPU
> utilization. perf report is below:
>
> Samples: 31M of event 'cycles', Event count (approx.): 16289978380877
> 44.74% postmaster [kernel.kallsyms] [k]
> _spin_lock_irqsave
> 15.03% postmaster postgres [.]
> 0x00000000002ea937
> 3.14% postmaster postgres [.] s_lock
>
> 2.30% postmaster [kernel.kallsyms] [k]
> compaction_alloc
> 2.21% postmaster postgres [.]
> HeapTupleSatisfiesMVCC
>

compaction_alloc points to "transparent huge pages" kernel problem,
while HeapTupleSatisfiesMVCC
points to the problem with each backend taking a ProcArrayLock for every
not-yet-committed tuple it encounters. I don't know which of those leads
to the _spin_lock_irqsave. It seems more likely to be transparent huge
pages that does that, but perhaps both of them do.

If it is the former, you can find other message on this list about
disabling it. If it is the latter, your best bet is to commit your bulk
inserts as soon as possible (this might be improved for 9.5, if we can
figure out how to test the alternatives). Please let us know what works.

If lowering shared_buffers works, I wonder if disabling the transparent
huge page compaction issue might let you bring shared_buffers back up
again.

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Craig James 2014-05-15 04:18:15 Stats collector constant I/O
Previous Message Merlin Moncure 2014-05-14 13:48:22 Re: CPU spikes and transactions