Re: Proposal of tunable fix for scalability of 8.4

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Scott Carey <scott(at)richrelevance(dot)com>
Cc: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Proposal of tunable fix for scalability of 8.4
Date: 2009-03-13 03:00:38
Message-ID: alpine.GSO.2.01.0903122152120.16050@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, 12 Mar 2009, Scott Carey wrote:

> Furthermore, if the problem was due to too much concurrency in the
> database with active connections, its hard to see how changing the lock
> code would change the result the way it did ?

What I wonder about is if the locking mechanism is accidentally turning
into a CPU resource scheduling problem on this benchmark. If the
connections were pooled instead, control over that scheduling would be
more explicit, because connections would more directly map onto physical
CPUs. What if the fall-off is because the sum of the working code set
here is simply exceeding the sum of the CPU caching available once the
number of active connections gets big enough? The real problem could be
that the connections waiting on ProcArray are just falling out of cache,
such that when they do wake up they take a while to page back in and keep
going.

I wouldn't actually bet anything on that theory though, or any of the
others offered here. I find wandering into performance bottleneck
analysis presuming you know what's going on to be dangerous. The bigger
issue here is that Jignesh is using a configuration known to be
problematic (lots of connections), which introduces some uncertaintly
about the true root cause here. Whether it's well founded or not, it
still hurts his case.

And to step back for a second, after reading up on it again I see that
Sun's internal iGen-OLTP benchmark "stresses lock management and
connectivity"[1], which makes me wonder even more than I did before about
how specific this fix is to this workload.

[1] http://blogs.sun.com/bmseer/entry/t2000_adds_database_leadership_to

> First just run a test with a tiny delay (5ms? 0?) and fewer users to
> compare.  If your theory that a connection pooler would help, that test
> would provide higher throughput with low user count and not be lock
> limited.

If the symptoms stay the same but are just scaled to a much lower
connection count, that might help rule out some types of context switching
and caching problem from the list of most likely suspects. Might as well
make it 0ms to minimize the number of connections.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD
>From pgsql-performance-owner(at)postgresql(dot)org Fri Mar 13 01:23:45 2009
Received: from localhost (unknown [200.46.208.211])
by mail.postgresql.org (Postfix) with ESMTP id 7C52663584A
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>; Fri, 13 Mar 2009 01:23:44 -0300 (ADT)
Received: from mail.postgresql.org ([200.46.204.86])
by localhost (mx1.hub.org [200.46.208.211]) (amavisd-maia, port 10024)
with ESMTP id 46725-06-4
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>;
Fri, 13 Mar 2009 01:23:39 -0300 (ADT)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from westnet.com (westnet.com [216.187.52.2])
by mail.postgresql.org (Postfix) with ESMTP id 87C6763EF14
for <pgsql-performance(at)postgresql(dot)org>; Fri, 13 Mar 2009 00:31:06 -0300 (ADT)
Received: from westnet.com (localhost [127.0.0.1])
by westnet.com (8.14.0/8.14.0) with ESMTP id n2D3V5tP029355;
Thu, 12 Mar 2009 23:31:05 -0400 (EDT)
Received: from localhost (gsmith(at)localhost)
by westnet.com (8.14.0/8.13.2/Submit) with ESMTP id n2D3V5QC029351;
Thu, 12 Mar 2009 23:31:05 -0400 (EDT)
X-Authentication-Warning: westnet.com: gsmith owned process doing -bs
Date: Thu, 12 Mar 2009 23:31:05 -0400 (EDT)
From: Greg Smith <gsmith(at)gregsmith(dot)com>
X-X-Sender: gsmith(at)westnet(dot)com
To: "Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
cc: Scott Carey <scott(at)richrelevance(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>,
Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>,
"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Proposal of tunable fix for scalability of 8.4
In-Reply-To: <49B9566C(dot)3010708(at)sun(dot)com>
Message-ID: <alpine(dot)GSO(dot)2(dot)01(dot)0903122322250(dot)16050(at)westnet(dot)com>
References: <C5DE98EC(dot)3381%scott(at)richrelevance(dot)com> <49B9566C(dot)3010708(at)sun(dot)com>
User-Agent: Alpine 2.01 (GSO 1184 2008-12-16)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=0 tagged_above=0 required=5 tests=none
X-Spam-Level:
X-Archive-Number: 200903/148
X-Sequence-Number: 33070

On Thu, 12 Mar 2009, Jignesh K. Shah wrote:

> That said the testkit that I am using is a lightweight OLTP typish
> workload which a user runs against a preknown schema and between various
> transactions that it does it emulates a wait time of 200ms.

After re-reading about this all again at
http://blogs.sun.com/jkshah/resource/pgcon_problems.pdf I remembered I
wanted more info on just what Sun's iGen OLTP does anyway. Here's a
collection of published comments on it that assembles into a reasonably
detailed picture, as long as you're somewhat familiar with what TPC-C
does:

http://blogs.sun.com/bmseer/entry/t2000_adds_database_leadership_to

"The iGEN-OLTP 1.5 benchmark is a SUN internally developed transaction
processing database workload. This workload simulates a light-weight
Global Order System that stresses lock management and connectivity."

http://www.mysqlperformanceblog.com/2008/02/27/a-piece-of-sunmysql-marketing/#comment-246663

"The iGen workload was created from actual customer workloads and has a
lot more complexity than Sysbench which only test very simple operations
one at a time. The iGen database consist of 6 tables and its executes a
combination of light, medium and heavy transactions."

http://www.sun.com/third-party/global/oracle/collateral/T2000_Oracle_iGEN_05-12-06.pdf?null

"The iGEN-OLTP benchmark is a stress and performance test, measuring the
throughput and simultaneous user connections of an OLTP database workload.
The iGEN-OLTP workload is based on customer applications and is
constructed as a 2-tier orders database application where three
transactions are executed:

* light read-only query
* medium read-only query
* 'heavy' read and insert operation.

The transactions are comprised of various SQL statements: read-only
selects, joins, update and insert operations. iGen OLTP avoids problems
that plague other OTLP benchmarks like TPC-C. TPC-C has problems with only
using light-weight queries, allowing artificial data partitioning, and
only testing a few database functions. The iGen transactions take almost
twice the computation work compared to the TPC-C transactions."

http://blogs.sun.com/ritu/entry/mysql_benchmark_us_t2_beats

"iGen OLTP avoids problems that plague other OTLP benchmarks like TPC-C.
In particular, it is completely random in table row selections and thus is
difficult to use artificial optimizations. iGen OLTP stresses process and
thread creation, process scheduling, and database commit processing...The
transactions are comprised of various SQL transactions: read-only selects,
joins, inserts and update operations."

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message sathiya psql 2009-03-13 08:28:05 Re: Full statement logging problematic on larger machines?
Previous Message Robert Haas 2009-03-13 01:29:52 Re: Proposal of tunable fix for scalability of 8.4