Re: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY)

From: Boris Mironov <boris_mironov(at)outlook(dot)com>
To: Madyshev Egor <E(dot)Madyshev(at)ftdata(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY)
Date: 2026-02-23 10:41:19
Message-ID: PH0PR08MB70208182284EBA18F64228798877A@PH0PR08MB7020.namprd08.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Egor,

Here comes v8 of the patch with all requested changes. Sorry it took a little while to implement...

> 1. Perhaps we should rename functions initRowMethodBin to
> initRowMethodBinary, initPopulateTableText to initPopulateTableTextCopy,
> initPopulateTableBinary to initPopulateTableBinaryCopy,
> initGenerateDataClientSideText to initGenerateDataClientSideTextFrmt,
> initGenerateDataClientSideBinary to initGenerateDataClientSideBinaryFrmt
> for better clarity.

Done

> 2. The --help output currently describes only the 'M' mode. Should we
>also add a description for the 'S' mode for completeness?

Fixed. Thanks for noticing

> 3. I`m wondering if the default value for the 'filler' column in
> initCreateTables is necessary? The current functionality seems
> unaffected, so perhaps we could avoid changing this function to keep the
> diff minimal.

That was done purely for debugging. Fixed

> 4. I noticed that vanilla client functionality in M mode is not
> implemented. Is there a specific reason for this? It seems feasible to
> implement by passing a counter, similar to how it`s done in
> initPopulateTableBinary. If there are reasons not to implement it, in
> mode Mg pgbench should not run mode Sg, it just pg_fatal.

I didn't do it initially to keep original logic intact. To unify new patch
with existing codebase I've added multi-transactional functionality
to client-side data generator and extracted "show progress" code
into own procedure. Now both client-side generators use it.

> 5. In mode client binary format generation c, It would be the right
> thing to do implement write progress of generating data and 'quiet' mode,
> as it already implement in client text format generation g.

Everything is unified now via new procedure showPopulateTableCopyProgress

> 6. In bufferData, when len == 1, we call bufferCharData, which already
> increments bin_copy_buffer_length. However, at the end of bufferCharData
> we increment it again, leading to a double increment.

Fixed. Thank you

> 7. I suggest adding column count in function initPopulateTableBinary,
> and initBranchBinary, initTellerBinary, initAccountBinary, to pass it
> from initAccountBinary to init_row, and use it in functions
> initBranchBinary, initTellerBinary, initAccountBinary. This will
> increase the readability of the code, and remove the magic numbers in
> the addColumnCounter call.

Done

> 8. I think check and install data_generation_type in function
> checkInitSteps is not quite right. In the current realization, pgbench
> allows run data generation many times (dtCdtC...), so i suggest do not
> touching this functionality. My suggestion would be to revert all the
> changes from function checkInitSteps, set the data_generation_type in
> switch in runInitSteps and remove call function checkInitSteps
> from main.

I thought about such solution initially, but then decided to implement extra
procedure where I can count how many times we asked to initialize data.
Since this is legacy code and I can see need for ability to run init multiple
times (eg, get exact timing of different methods for comparison or run
same code multiple times to get average timing), I just show warning.

All in all, thanks a lot for your deep analysis of the code and great
suggestions on how to make better.

Best regards,
Boris

Attachment Content-Type Size
v8-pgbench-faster-modes.patch application/octet-stream 116.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Kuzmenkov 2026-02-23 11:01:08 Trouble with ROWID_VAR in custom plan node
Previous Message John Naylor 2026-02-23 10:38:01 Re: refactor architecture-specific popcount code