Re: CPU spikes and transactions

From: Dave Owens <dave(at)teamunify(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Julien Cigar <jcigar(at)ulb(dot)ac(dot)be>, Tony Kay <tony(at)teamunify(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: CPU spikes and transactions
Date: 2014-05-13 23:04:50
Message-ID: CA+OQrzjQYNmCvRSzhZCKS_XYuqZLQGUuVoDDa564_fMxGA7rKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

Apologies for resurrecting this old thread, but it seems like this is
better than starting a new conversation.

We are now running 9.1.13 and have doubled the CPU and memory. So 2x 16
Opteron 6276 (32 cores total), and 64GB memory. shared_buffers set to 20G,
effective_cache_size set to 40GB.

We were able to record perf data during the latest incident of high CPU
utilization. perf report is below:

Samples: 31M of event 'cycles', Event count (approx.): 16289978380877
44.74% postmaster [kernel.kallsyms] [k]
_spin_lock_irqsave
15.03% postmaster postgres [.]
0x00000000002ea937
3.14% postmaster postgres [.] s_lock

2.30% postmaster [kernel.kallsyms] [k]
compaction_alloc
2.21% postmaster postgres [.]
HeapTupleSatisfiesMVCC
1.75% postmaster postgres [.]
hash_search_with_hash_value
1.25% postmaster postgres [.]
ExecScanHashBucket
1.20% postmaster postgres [.] SHMQueueNext

1.05% postmaster postgres [.] slot_getattr

1.04% init [kernel.kallsyms] [k]
native_safe_halt
0.73% postmaster postgres [.] LWLockAcquire

0.59% postmaster [kernel.kallsyms] [k] page_fault

0.52% postmaster postgres [.] ExecQual

0.40% postmaster postgres [.] ExecStoreTuple

0.38% postmaster postgres [.] ExecScan

0.37% postmaster postgres [.]
check_stack_depth
0.35% postmaster postgres [.] SearchCatCache

0.35% postmaster postgres [.]
CheckForSerializableConflictOut
0.34% postmaster postgres [.] LWLockRelease

0.30% postmaster postgres [.] _bt_checkkeys

0.28% postmaster libc-2.12.so [.] memcpy

0.27% postmaster [kernel.kallsyms] [k]
get_pageblock_flags_group
0.27% postmaster postgres [.] int4eq

0.27% postmaster postgres [.]
heap_page_prune_opt
0.27% postmaster postgres [.]
pgstat_init_function_usage
0.26% postmaster [kernel.kallsyms] [k] _spin_lock

0.25% postmaster postgres [.] _bt_compare

0.24% postmaster postgres [.]
pgstat_end_function_usage

...please let me know if we need to produce the report differently to be
useful.

We will begin reducing shared_buffers incrementally over the coming days.

Dave Owens

541-359-2602
TU Facebook<https://app.getsignals.com/link?url=https%3A%2F%2Fwww.facebook.com%2Fteamunify&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgOCP-IMLDA&k=179943a8-e0fa-494a-f79a-f86a69d3abdc>
| Free OnDeck Mobile
Apps<https://app.getsignals.com/link?url=http%3A%2F%2Fwww.teamunify.com%2F__corp__%2Fondeck%2F&ukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgOCP-IMLDA&k=504a29f5-3415-405c-d550-195aa1ca1ee3>

On Tue, Oct 15, 2013 at 8:14 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Tue, Oct 15, 2013 at 12:26 PM, Julien Cigar <jcigar(at)ulb(dot)ac(dot)be> wrote:
> > On Tue, Oct 15, 2013 at 08:59:08AM -0700, Tony Kay wrote:
> >> On Mon, Oct 14, 2013 at 4:42 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
> >>
> >> > On 15.10.2013 01:00, Tony Kay wrote:
> >> > > Hi,
> >> > >
> >> > > I'm running 9.1.6 w/22GB shared buffers, and 32GB overall RAM on a
> >> > > 16 Opteron 6276 CPU box. We limit connections to roughly 120, but
> >> > > our webapp is configured to allocate a thread-local connection, so
> >> > > those connections are rarely doing anything more than half the time.
> >> >
> >> > Lower your shared buffers to about 20% of your RAM, unless you've
> tested
> >> > it's actually helping in your particular case. It's unlikely you'll
> get
> >> > better performance by using more than that, especially on older
> >> > versions, so it's wiser to leave the rest for page cache.
> >> >
> >> > It might even be one of the causes of the performance issue you're
> >> > seeing, as shared buffers are not exactly overhead-free.
> >> >
> >> > See this for more details on tuning:
> >> >
> >> > http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
> >>
> >>
> >> I had followed the general directions from several sources years ago,
> which
> >> indicate up to 40% of RAM. We've been running very large shared buffers
> for
> >
> > in general it's best to start with 10-15% of the RAM and no more then
> > 2-4 GB
> >
> >> 4 years now, but it is difficult to generate a good real load without
> >> testing against users, so we have not felt the need to move it around.
> In
> >> general, I don't tend to tinker with a setting that has been fine for
> this
> >> long without good reason. I've been wanting to upgrade to the newer
> >> mmap-based versions of pgsql, but was waiting to re-tune this when I
> did so.
> >>
> >> Why do you suspect that shared_buffers would cause the behavior I'm
> seeing?
> >>
> >
> > for two reasons:
> >
> > - some of the overhead of bgwriter and checkpoints is more or less linear
> > in the size of shared_buffers, for example it could be possible that a
> > large quantity of data could be dirty when a checkpoint occurs).
> >
> > - the OS cache is also being used for reads and writes, the larger
> > shared_buffers is, the more you risk double buffering (same blocks
> > in the OS cache and in the database buffer cache).
>
> That's good reasoning but is not related to the problem faced by the
> OP. The real reason why I recommend to keep shared buffers at max
> 2GB, always, is because we have major contention issues which we
> presume are in the buffer area (either in the mapping or in the clock
> sweep) but could be something else entirely. These issues tend to
> show up on fast machines in all- or mostly- read workloads.
>
> We are desperate for profiles demonstrating the problem in production
> workloads. If OP is willing to install and run perf in production
> (which is not a bad idea anyways), then my advice is to change nothing
> until we have a chance to grab a profile. These types of problems are
> notoriously difficult to reproduce in test environments.
>
> merlin
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2014-05-14 13:48:22 Re: CPU spikes and transactions
Previous Message Tim Kane 2014-05-13 20:57:10 Re: Constraint exclusion won't exclude parent table