Quick Links

Re: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY)

From:	Boris Mironov <boris_mironov(at)outlook(dot)com>
To:	Madyshev Egor <E(dot)Madyshev(at)ftdata(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY)
Date:	2026-01-30 14:47:54
Message-ID:	PH0PR08MB70206B42F5A79A518B610815889FA@PH0PR08MB7020.namprd08.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello Egor,

Thank you very much for taking this patch under your wing!

> I propose, by analogy with the existing 'g'/'G' modes, to use lowercase
> letters for client-side data generation and uppercase letters for
> server-side generation. Furthermore, I propose considering making the
> "one transaction per scale" mode a separate setting. This would result
> in the following modes:
> 1. g: COPY .. FROM STDIN .. TEXT, single transaction (orig. mode)
> 2. c: COPY .. FROM STDIN .. BINARY, single transaction (added mode)
> 3. G: INSERT .. SELECT generate_series, single transaction (orig. mode)
> 4. I: INSERT .. SELECT unnest, single transaction (added mode)
> And: M: multiple transactions. A setting that, when used, makes a mode
> run with a transaction for each scale instead of a single transaction.
> This would yield 8 possible combinations.

Sure thing. I agree with your proposal to add more flexibility to
parameters with single exception. For UNNEST test I would suggest
to use "U" instead of "I" as it might be confusing later in case of
another patch from current CommitFest will make it into the master.
I'm referring to:

https://commitfest.postgresql.org/patch/6242/

It uses parameter "-i" to start use multiple threads to populate tables.

> It would be reasonable to first collect performance measurements for
> these modes and then decide whether to keep them, before proceeding with
> a full implementation including their selection.

Since logic will be slightly different by following your proposal new set
of metrics will be required. That's for sure.

My main motivation in splitting one huge transaction to fill tables
into smaller ones comes from another idea that was put on
a backburner - running data population via multiple threads. This
idea is implemented in above mentioned patch by Mircea Cadariu.
By amount of changes in that patch it is clear that we're quite equal
by number of lines. Hence putting the change into my patch would
be overwhelming for any reviewer.

Another reason for smaller in size transactions ("one per scale")
is my experience during generation of test databases that are much
bigger than host RAM (e.g., scale=5000). Data population phase is
not just slow, but more than often has to use multiple checkpoints
for such single transaction because even my max_wal_size was
smaller than size of such "change". One might argue that my DB
is not tuned properly, but it's a topic for another day. As a side
effect of decision to use multiple transactions raises another
issue - inability to use FREEZE optimisation for COPY commands
which leads to Autovacuum storm in turn even during very process
of data population.

> Thus, I propose reconsidering the approach to data generation modes
> and adding a setting to control the number of transactions.
> I also suggest conducting new, more accurate performance measurements to
> inform the decision on the necessity of the additional generation modes.

Agree and Agree. Both make perfect sense.

Best regards,
Boris

In response to

RE: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY) at 2026-01-30 13:22:30 from Madyshev Egor

Browse pgsql-hackers by date

	From	Date	Subject
Previous Message	Álvaro Herrera	2026-01-30 14:37:57	Re: Flush some statistics within running transactions