Re: PATCH: pgbench - merging transaction logs

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: pgbench - merging transaction logs
Date: 2015-03-20 12:43:14
Message-ID: alpine.DEB.2.10.1503201327170.12124@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Robert,

>> The fprintf we are talking about occurs at most once per pgbench
>> transaction, possibly much less when aggregation is activated, and this
>> transaction involves networks exchanges and possibly disk writes on the
>> server.
>
> random() was occurring four times per transaction rather than once,
> but OTOH I think fprintf() is probably a much heavier-weight
> operation.

Yes, sure.

My point is that if there are many threads and tremendous TPS, the
*detailed* per-transaction log (aka simple log) is probably a bad choice
anyway, and the aggregated version is the way to go.

Note that even without mutex fprintf may be considered a "heavy function"
which is going to slow down the transaction rate significantly. That could
be tested as well.

It is possible to reduce the lock time by preparing the string (which
would mean introducing buffers) and just do a "fputs" under mutex. That
would not reduce the print time anyway, and that may add malloc/free
operations, though.

> The way to know if there's a real problem here is to test it, but I'd be
> pretty surprised if there isn't.

Indeed, I think I can contrive a simple example where it is, basically a
more or less empty or read only transaction (eg SELECT 1).

My opinion is that there is a tradeoff between code simplicity and later
maintenance vs feature benefit.

If threads are assumed and fprintf is used, the feature is much simpler to
implement, and the maintenance is lighter. The alternative implementation
means reparsing the generated files over and over for merging their
contents.

Also, I do not think that the detailed log provides much benefit with very
fast transactions, where probably the aggregate is a much better choice
anyway. If the user persists, she may generate a per-thread log and merge
it later, in which case a merge script is needed, but I do not think that
would be a bad thing.

Obviously, all that is only my opinion and is quite debatable.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Max Filippov 2015-03-20 12:43:16 Re: configure can't detect proper pthread flags
Previous Message Bruce Momjian 2015-03-20 12:41:38 Re: "snapshot too large" error when initializing logical replication (9.4)