Quick Links

Re: Horizontal scalability/sharding

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Horizontal scalability/sharding
Date:	2015-09-01 18:59:11
Message-ID:	55E5F57F.2090902@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 09/01/2015 08:22 PM, Andres Freund wrote:
> On 2015-09-01 14:11:21 -0400, Robert Haas wrote:
>> On Tue, Sep 1, 2015 at 2:04 PM, Tomas Vondra
>> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>> Memory bandwidth, for example. It's quite difficult to spot, because the
>>> intuition is that memory is fast, but thanks to improvements in storage (and
>>> stagnation in RAM bandwidth), this is becoming a significant issue.
>>
>> I'd appreciate any tips on how to spot problems of this type. But
>> it's my impression that perf, top, vmstat, and other Linux performance
>> tools will count time spent waiting for memory as CPU time, not idle
>> time. If that's correct, that wouldn't explain workloads where CPU
>> utilization doesn't reach 100%. Rather, it would show up as CPU time
>> hitting 100% while tps remains low.
>
> Yea.
>
> -e bus-cycles is a good start to measure where bus traffic is
> relevant. Depending on the individual cpu other events can be helpful.

long-story: https://people.freebsd.org/~lstewart/articles/cpumemory.pdf

It's from 2007 and only explains oprofile (chapter 7), which is mostly
abandoned in favor of perf nowadays. Perf can produce similar stats, so
the discussion is still valid. But it also shows cachegrind (valgrind
module).

perf examples: http://www.brendangregg.com/perf.html

Most of the examples with "CPU" in the comment are relevant. Usually
"perf stat" and "perf stat -d" are good starting points - once you get a
lot of LLC misses or too many instructions per cycle, it's a sign of
memory bandwidth problems.

Sadly, this is partially caused by our volcano-style executor and
sharding alone can do nothing about that.

>
>>> Process-management overhead is another thing we tend to ignore, but once you
>>> get to many processes all willing to work at the same time, you need to
>>> account for that.
>>
>> Any tips on spotting problems in that area?
>
> Not perfect, but -e context-switches (general context switches) and -e
> syscalls:sys_enter_semop (for postgres enforced context switches) is
> rather useful when combined with --call-graph dwarf ('fp' sometimes
> doesn't see through libc which is most of the time not compiled with
> -fno-omit-frame-pointer).

Right, this is about the best I'm aware of.

The problem often is not in the number of context switches, but in the
fact that all the processes share the same (very limited) L caches on
the CPU. Each process dirties the caches for the other processes,
lowering the hit ratios. Which can be spotted using the commands above.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: Horizontal scalability/sharding at 2015-09-01 18:22:49 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Josh Berkus	2015-09-01 19:19:40	Re: Horizontal scalability/sharding
Previous Message	Tom Lane	2015-09-01 18:56:47	Re: 9.4 broken on alpha