Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: coelho(at)cri(dot)ensmp(dot)fr, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Date: 2013-07-18 02:58:16
Message-ID: 51E759C8.7000100@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/17/13 9:16 PM, Tatsuo Ishii wrote:
> Now suppose we have 3 transactions and each has following values:
>
> d(0) = 10
> d(1) = 20
> d(2) = 30
>
> t(0) = 100
> t(1) = 110
> t(2) = 120
>
> That says pgbench expects the duration 10 for each
> transaction. Actually, the first transaction runs slowly for some
> reason and the lag = 100 - 10 = 90. However, tx(1) and tx(2) are
> finished on schedule because they spend only 10 (110-10 = 10, 120-110
> = 10). So the expected average lag would be 90/3 = 30.

The clients are not serialized here in any significant way, even when
they shared a single process/thread. I did many rounds of tracing
through this code with timestamps on each line, and the sequence of
events here will look like this:

client 0: send "SELECT..." to server. yield to next client.
client 1: send "SELECT..." to server. yield to next client.
client 2: send "SELECT..." to server. yield to next client.
select(): wait for the first response from any client.
client 0: receive response. complete transaction, compute lag.
client 1: receive response. complete transaction, compute lag.
client 2: receive response. complete transaction, compute lag.

There is nothing here that is queuing the clients one after the other.
If (0) takes 100ms before its reply comes back, (1) and (2) can receive
their reply back and continue forward at any time. They are not waiting
for (0); it has yielded control while waiting for a response. All three
times are independent once you reach the select() point where all are
active.

In this situation, if the server gets stuck doing something such that it
takes 100ms before any client receives a response, it is correct to
penalize every client for that latency. All three clients could have
received the information earlier if the server had any to send them. If
they did not, they all were suffering from some sort of lag.

I'm not even sure why you spaced the start times out at intervals of 10.
If I were constructing an example like this, I'd have them start at
times of 0, 1, 2--as fast as the CPU can fire off statements
basically--and then start waiting from that point. If client 1 takes 10
units of time to send its query out before client 2 runs, and the rate
goal requires 10 units of time, the rate you're asking for is impossible.

For sorting out what's going on with your two systems, I would recommend
turning on debugging output with "-d" and looking at the new
per-transaction latency numbers that the feature reports. If your
theory that the lag is going up as the test proceeds is true, that
should show up in the individual latency numbers too.

Based on what I saw during weeks of testing here, I would be more
suspicious that there's a system level difference between your two
servers than to blame the latency calculation. I saw a *lot* of weird
system issues myself when I started looking that carefully at sustained
throughput. The latency reports from the perspective of Fabien's code
were always reasonable though. When something delays every client, it
counts that against every active client's lag, and that's the right
thing to do.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2013-07-18 03:15:14 Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
Previous Message Tatsuo Ishii 2013-07-18 01:46:01 Re: pgbench patches