Re: [PATCH] pgbench log file headers

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Adam Hendel <adam(at)tembo(dot)io>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] pgbench log file headers
Date: 2024-01-06 15:20:20
Message-ID: CALDaNm1gABmcPxuZ_XcDS5zOPn-Rqbp2SNG7Zx7p7O=s0M1CKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 21 Nov 2023 at 09:52, Adam Hendel <adam(at)tembo(dot)io> wrote:
>
> Hello,
>
> On Mon, Nov 13, 2023 at 6:01 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>>
>> Hi,
>>
>> On 2023-11-13 11:55:07 -0600, Adam Hendel wrote:
>> > Currently, pgbench will log individual transactions to a logfile when the
>> > `--log` parameter flag is provided. The logfile, however, does not include
>> > column header. It has become a fairly standard expectation of users to have
>> > column headers present in flat files. Without the header in the pgbench log
>> > files, new users must navigate to the docs and piece together the column
>> > headers themselves. Most industry leading frameworks have tooling built in
>> > to read column headers though, for example python/pandas read_csv().
>>
>> The disadvantage of doing that is that a single pgbench run with --log will
>> generate many files when using -j, to avoid contention. The easiest way to
>> deal with that after the run is to cat all the log files to a larger one. If
>> you do that with headers present, you obviously have a few bogus rows (the
>> heades from the various files).
>>
>> I guess you could avoid the "worst" of that by documenting that the combined
>> log file should be built by
>> cat {$log_prefix}.${pid} {$log_prefix}.${pid}.*
>> and only outputting the header in the file generated by the main thread.
>>
>>
>> > We can improve the experience for users by adding column headers to pgbench
>> > logfiles with an optional command line flag, `--log-header`. This will keep
>> > the API backwards compatible by making users opt-in to the column headers.
>> > It follows the existing pattern of having conditional flags in pgbench’s
>> > API; the `--log` option would have both –log-prefix and –log-header if this
>> > work is accepted.
>>
>> > The implementation considers the column headers only when the
>> > `--log-header` flag is present. The values for the columns are taken
>> > directly from the “Per-Transaction Logging” section in
>> > https://www.postgresql.org/docs/current/pgbench.html and takes into account
>> > the conditional columns `schedule_lag` and `retries`.
>> >
>> >
>> > Below is an example of what that logfile will look like:
>> >
>> >
>> > pgbench postgres://postgres:postgres(at)localhost:5432/postgres --log
>> > --log-header
>> >
>> > client_id transaction_no time script_no time_epoch time_us
>>
>> Independent of your patch, but we imo ought to combine time_epoch time_us in
>> the log output. There's no point in forcing consumers to combine those fields,
>> and it makes logging more expensive... And if we touch this, we should just
>> switch to outputting nanoseconds instead of microseconds.
>>
>> It also is quite odd that we have "time" and "time_epoch", "time_us", where
>> time is the elapsed time of an individual "transaction" and time_epoch +
>> time_us together are a wall-clock timestamp. Without differentiating between
>> those, the column headers seem not very useful, because one needs to look in
>> the documentation to understand the fields.
>>
>>
>> I don't think there's all that strong a need for backward compatibility in
>> pgbench. We could just change the columns as I suggest above and then always
>> emit the header in the "main" log file.
>
>
> I updated the patch to always log the header and only off the main thread. As for the time headers, I will work on renaming/combining those in a separate commit as:

One of the test has failed in CFBOT at[1] with:
[09:15:00.526](0.000s) not ok 422 - transaction count for
/tmp/cirrus-ci-build/build/testrun/pgbench/001_pgbench_with_server/data/t_001_pgbench_with_server_main_data/001_pgbench_log_3.25193
(11)
[09:15:00.526](0.000s) # Failed test 'transaction count for
/tmp/cirrus-ci-build/build/testrun/pgbench/001_pgbench_with_server/data/t_001_pgbench_with_server_main_data/001_pgbench_log_3.25193
(11)'
# at /tmp/cirrus-ci-build/src/bin/pgbench/t/001_pgbench_with_server.pl
line 1257.
[09:15:00.526](0.000s) not ok 423 - transaction format for 001_pgbench_log_3
[09:15:00.526](0.000s) # Failed test 'transaction format for
001_pgbench_log_3'
# at /tmp/cirrus-ci-build/src/bin/pgbench/t/001_pgbench_with_server.pl
line 1257.
# Log entry not matching: client_id transaction_no time script_no
time_epoch time_us
# Running: pgbench --no-vacuum -f
/tmp/cirrus-ci-build/build/testrun/pgbench/001_pgbench_with_server/data/t_001_pgbench_with_server_main_data/001_pgbench_incomplete_transaction_block

More details for the same is available at [2]

[1] - https://cirrus-ci.com/task/5139049757802496
[2] - https://api.cirrus-ci.com/v1/artifact/task/5139049757802496/testrun/build/testrun/pgbench/001_pgbench_with_server/log/regress_log_001_pgbench_with_server

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-01-06 15:30:03 Re: POC: Extension for adding distributed tracing - pg_tracing
Previous Message vignesh C 2024-01-06 15:12:42 Re: Shared detoast Datum proposal