Quick Links

RE: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY)

From:	Madyshev Egor <E(dot)Madyshev(at)ftdata(dot)ru>
To:	Boris Mironov <boris_mironov(at)outlook(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	RE: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY)
Date:	2026-01-30 13:22:30
Message-ID:	c4a589d12584455b9197ef38899297a6@localhost.localdomain
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello Boris,

I've reviewed and tested your patch. In some modes, I did observe a
performance improvement. However, in my opinion, the current set of
modes is not transparent enough.

I propose, by analogy with the existing 'g'/'G' modes, to use lowercase
letters for client-side data generation and uppercase letters for
server-side generation. Furthermore, I propose considering making the
"one transaction per scale" mode a separate setting. This would result
in the following modes:
1. g: COPY .. FROM STDIN .. TEXT, single transaction (orig. mode)
2. c: COPY .. FROM STDIN .. BINARY, single transaction (added mode)
3. G: INSERT .. SELECT generate_series, single transaction (orig. mode)
4. I: INSERT .. SELECT unnest, single transaction (added mode)
And: M: multiple transactions. A setting that, when used, makes a mode
run with a transaction for each scale instead of a single transaction.
This would yield 8 possible combinations.

It would be reasonable to first collect performance measurements for
these modes and then decide whether to keep them, before proceeding with
a full implementation including their selection.
The provided measurements do not seem conclusive. For example, in mode
G(1) at scale 200, the result was 46.64, while the next measurement for
G(2) was 56.6, while mode i(3) falls between them at 47.63. Could you
please describe how you collected the performance measurements? Is it
possible that the measurement deviations significantly affected the
results shown in the table? It would be correct to take measurements
several times and then present a table with averages.

> I also switched from one huge transaction for COPY FROM BINARY to
> 'one per scale'. This will simplify merge with multi-threaded data load
> proposed by Mircea. Unfortunately, it killed possibility to freeze
> data right away, which was possible when table truncation and data load
> was done in the same transaction.

It seems incorrect to me to make a decision to abandon the freeze
optimization solely because of another patch, especially one that is
not yet in master. Please provide more motivation on why using one
transaction per scale is more beneficial than a single transaction
combined with the freeze optimization. Having a setting to switch
the transaction mode would allow avoiding this trade-off.

Thus, I propose reconsidering the approach to data generation modes
and adding a setting to control the number of transactions.
I also suggest conducting new, more accurate performance measurements to
inform the decision on the necessity of the additional generation modes.

Best regards,
Egor

In response to

Re: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY) at 2026-01-30 07:06:30 from Boris Mironov

Responses

Re: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY) at 2026-01-30 14:47:54 from Boris Mironov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bernd Helmle	2026-01-30 13:27:44	Re: Change default of jit to off
Previous Message	Laurenz Albe	2026-01-30 13:17:05	Re: Change default of jit to off