Re: pgbench--new transaction type

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench--new transaction type
Date: 2011-06-11 19:21:46
Message-ID: BANLkTi=adoTxfqyb6Kh2p64Zyyoz4dZv+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 29, 2011 at 7:04 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> On 05/29/2011 03:11 PM, Jeff Janes wrote:
>>
>> If you use "pgbench -S -M prepared" at a scale where all data fits in
>> memory, most of what you are benchmarking is network/IPC chatter, and
>> table locking.
>
> If you profile it, you'll find a large amount of the time is actually spent
> doing more mundane things, like memory allocation.  The network and locking
> issues are really not the bottleneck at all in a surprising number of these
> cases.

I wouldn't expect IPC chatter to show up in profiling, because it
costs wall time, but not CPU time. The time spent might be attributed
to the kernel, or to pgbench, or to nothing at all.

As part of the "Eviscerating the parser" discussion, I made a hack
that had exec_simple_query do nothing but return a dummy
completionTag. So there was no parsing, planning, or execution.
Under this mode, I got 44758 TPS, or 22.3 microsecons/transaction,
which should represent the cost of IPC chatter and pgbench overhead.

The breakdown I get, in microseconds per item, are:

53.70 cost of a select and related overhead via -S -M prepared. of which:
22.34 cost of IPC and pgbench roundtrip, estimated via above discussion
16.91 cost of the actual execution of the select statement, estimated
via the newly suggested -P mode.
--------
14.45 residual usec cost, covering table locking, transaction begin
and end, plus measurement errors.

Because all my tests were single-client, the cost of locking would be
much lower than they would be in contended cases. However, I wouldn't
trust profiling to accurate reflect the locking time anyway, for the
same reason I don't trust it for IPC chatter--wall time is consumed
but not spent on the CPU, so is not counted by profiling.

As you note memory allocation consumes much profiling time. However,
memory allocation is a low level operation which is always in support
of some higher purpose, such as parsing, execution, or taking locks.
My top down approach attempts to assign these bottom-level costs to
the proper higher level purpose.

> Your patch isn't really dependent on your being right about the
> cause here, which means this doesn't impact your submissions any.  Just
> wanted to clarify that what people expect are slowing things down in this
> situation and what actually shows up when you profile are usually quite
> different.
>
> I'm not sure whether this feature makes sense to add to pgbench, but it's
> interesting to have it around for developer testing.

Yes, this is what I thought the opinion might be. But there is no
repository of such "useful for developer testing" patches, other than
this mailing list. That being the case, would it even be worthwhile
to fix it up more and submit it to commitfest?

>> some numbers for single client runs on 64-bit AMD Opteron Linux:
>> 12,567 sps  under -S
>> 19,646 sps  under -S -M prepared
>> 58,165 sps  under -P
>>
>
> 10,000 is too big of a burst to run at once.  The specific thing I'm
> concerned about is what happens if you try this mode when using "-T" to
> enforce a runtime limit, and your SELECT rate isn't high.  If you're only
> doing 100 SELECTs/second because your scale is big enough to be seek bound,
> you could overrun by nearly two minutes.

OK. I wouldn't expect someone to want to use -P under scales that
cause that to happen, but perhaps it should deal with it more
gracefully. I picked 10,000 just because it obviously large enough
for any hardware I have, or will have for the foreseeable future.

> I think this is just a matter of turning the optimization around a bit.
>  Rather than starting with a large batch size and presuming that's ideal, in
> this case a different approach is really needed.  You want the smallest
> batch size that gives you a large win here.  My guess is that most of the
> gain here comes from increasing batch size to something in the 10 to 100
> range; jumping to 10K is probably overkill.  Could you try some smaller
> numbers and see where the big increases start falling off at?

I've now used a variety of sizes (powers of 2 up to 8192, plus 10000);
and the results fit very well to a linear equation, with 17.3
usec/inner select plus 59.0 usec/outer invocation.

So at a loop of 512, you would have overhead of 59.0/512=0.115 out of
total time of 17.4, or 0.7% overhead. So that should be large enough.

Thanks for the other suggestions.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dan Ports 2011-06-11 20:03:24 Re: Small SSI issues
Previous Message Jeroen Vermeulen 2011-06-11 19:08:29 Re: [BUG] Denormal float values break backup/restore