Quick Links

Re: PATCH: pgbench - random sampling of transaction written into log

From:	"Tomas Vondra" <tv(at)fuzzy(dot)cz>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Tomas Vondra" <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: PATCH: pgbench - random sampling of transaction written into log
Date:	2012-08-30 19:48:24
Message-ID:	3a965133027fca60d8da3d3382d54103.squirrel@sq.gransy.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 30 Srpen 2012, 17:46, Robert Haas wrote:
> On Sun, Aug 26, 2012 at 1:04 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>> Attached is an improved patch, with a call to rand() replaced with
>> getrand().
>>
>> I was thinking about the counter but I'm not really sure how to handle
>> cases like "39%" - I'm not sure a plain (counter % 100 < 37) is not a
>> good sampling, because it always keeps continuous sequences of
>> transactions. Maybe there's a clever way to use a counter, but let's
>> stick to a getrand() unless we can prove is't causing issues. Especially
>> considering that a lot of data won't be be written at all with low
>> sampling rates.
>
> I like this patch, and I think sticking with a random number is a good
> idea. But I have two suggestions. Number one, I think the sampling
> rate should be stored as a float, not an integer, because I can easily
> imagine wanting a sampling rate that is not an integer percentage -
> especially, one that is less than one percent, like half a percent or
> a tenth of a percent. Also, I suggest that the command-line option
> should be a long option rather than a single character option. That
> will be more mnemonic and avoid using up too many single letter
> options, of which there is a limited supply. So to sample every
> hundredth result, you could do something like this:
>
> pgbench --latency-sample-rate 0.01

Right, I was thinking about that too. I'll do that in the next version of
the patch.

> Another option I personally think would be useful is an option to
> record only those latencies that are above some minimum bound, like
> this:
>
> pgbench --latency-only-if-more-than $MICROSECONDS
>
> The problem with recording all the latencies is that it tends to have
> a material impact on throughput. Your patch should address that for
> the case where you just want to characterize the latency, but it would
> also be nice to have a way of recording the outliers.

That sounds like a pretty trivial patch. I've been thinking about yet
another option - histograms (regular or with exponential bins).

What I'm not sure about is which of these options should be allowed at the
same time - to me, combinations like 'sampling + aggregation' don't make
much sense. Maybe except 'latency-only-if-more-than + aggregation'.

Tomas

In response to

Re: PATCH: pgbench - random sampling of transaction written into log at 2012-08-30 15:46:59 from Robert Haas

Responses

Re: PATCH: pgbench - random sampling of transaction written into log at 2012-08-30 21:44:05 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2012-08-30 20:07:41	Re: Fix for gistchoose
Previous Message	Tom Lane	2012-08-30 19:39:00	Re: patch: shared session variables