Re: improving concurrent transactin commit rate

From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: improving concurrent transactin commit rate
Date: 2009-03-27 20:42:36
Message-ID: 20090327204236.GT12225@frubble.xen.chris-lamb.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 25, 2009 at 01:48:03PM -0500, Kenneth Marshall wrote:
> On Wed, Mar 25, 2009 at 05:56:02PM +0000, Sam Mason wrote:
> > On Wed, Mar 25, 2009 at 12:01:57PM -0500, Kenneth Marshall wrote:
> > > Are you sure that you are able to actually drive the load at the
> > > high end of the test regime? You may need to use multiple clients
> > > to simulate the load effectively.
> >
> > Notice that the code is putting things into the background and then
> > waiting for them to finish so there will be multiple clients. Or maybe
> > I'm misunderstanding what you mean.
>
> I did notice how your test harness was designed.

OK, that's turned out to be a good point. I've now written five
different versions and they don't seem to give the results I'm expecting
at all!

Running tests from another machine seems to slow all tests down; I'd put
this down to the increased latency between server and client but am not
sure how to demonstrate (i.e. "prove" in layman terms) this.

I've got my original shell based approach, a Python version and three
C versions (fork, pthreads and select based concurrency). The most
scalable, by quite a long way, is the Python version and I don't
understand why. I've plotted the mean transactions per second (and
standard deviation) for all tests in the following SVG file:

http://samason.me.uk/~sam/pg-concurrency/compare.svg

The Python version is pretty linear up to 18 clients and then seems to
hit a wall; all the other versions petered out much earlier. The fact
I'm IO bound would mean the shell and C based approaches are going to
be similar, but why is the Python version so much faster? CPU time was
highest in the shell based version, generally topping out around 50%
utilisation but the others topped out at around 35%; so I'd say I was
still IO bound.

The source for the tests is available here:

http://samason.me.uk/~sam/pg-concurrency/concurrent.sh
http://samason.me.uk/~sam/pg-concurrency/concurrent.py
http://samason.me.uk/~sam/pg-concurrency/concurrent-fork.c
http://samason.me.uk/~sam/pg-concurrency/concurrent-pthreads.c
http://samason.me.uk/~sam/pg-concurrency/concurrent-select.c

I think I'm abusing things a bit with my fork based version; it
all seems to work OK but I wouldn't trust this style in real code.
Otherwise, if people have comments about how to improve things I'd be
interested to know.

--
Sam http://samason.me.uk/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-03-27 20:49:45 Re: small but useful patches for text search
Previous Message Tom Lane 2009-03-27 20:27:53 Re: 8.4 open items list