Re: PATCH: pgbench - random sampling of transaction written into log

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: pgbench - random sampling of transaction written into log
Date: 2012-08-26 00:48:39
Message-ID: 50397267.3060405@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 26.8.2012 00:19, Jeff Janes wrote:
> On Fri, Aug 24, 2012 at 2:16 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>> Hi,
>>
>> attached is a patch that adds support for random sampling in pgbench, when
>> it's executed with "-l" flag. You can do for example this:
>>
>> $ pgbench -l -T 120 -R 1 db
>>
>> and then only 1% of transactions will be written into the log file. If you
>> omit the tag, all the transactions are written (i.e. it's backward
>> compatible).
>
> Hi Tomas,
>
> You use the rand() function. Isn't that function not thread-safe?
> Or, if it is thread-safe, does it accomplish that with a mutex? That
> was a problem with a different rand function used in pgbench that
> Robert Haas fixed a while ago, 4af43ee3f165c8e4b332a7e680.

Hi Jeff,

Aha! Good catch. I've used rand() which seems to be neither reentrant or
thread-safe (unless the man page is incorrect). Anyway, using pg_erand48
or getrand seems like an appropriate fix.

> Also, what benefit is had by using modulus on rand(), rather than just
> modulus on an incrementing counter?

Hmm, I was thinking about that too, but I wasn't really sure how would
that behave with multiple SQL files etc. But now I see the files are
actually chosen randomly, so using a counter seems like a good idea.

> Could you explain the value of this patch, given your other one that
> does aggregation? If both were accepted, I think I would always use
> the aggregation one in preference to this one.

The difference is that the sample contains information that is simply
unavailable in the aggregated output. For example when using multiple
files, you can compute per-file averages from the sample, but the
aggregated output contains just a single line for all files combined.

Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2012-08-26 02:48:59 Re: Timing overhead and Linux clock sources
Previous Message Bruce Momjian 2012-08-25 23:12:47 Re: psql \set vs \copy - bug or expected behaviour?