Quick Links

Re: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY)

From:	Boris Mironov <boris_mironov(at)outlook(dot)com>
To:	Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY)
Date:	2025-11-21 13:26:05
Message-ID:	PH0PR08MB702059D610C7D84594CD3BB388D5A@PH0PR08MB7020.namprd08.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Ashutosh,

Just wanted to let you know that I've submitted this patch
to CommitFest (see https://commitfest.postgresql.org/patch/6245/)

Interestingly enough there is one more patch from Mircea Cadariu in the same
CommitFest about pgbench (https://commitfest.postgresql.org/patch/6242/)
That patch has been submitted few days ago and is proposing to run
data generation phase in parallel threads. It shows significant
improvements over performance of original single-thread code.

Hopefully sooner or later pgbench will get significant performance
gains in data generation from these two patches.

Original version of my patch failed in GitHub tests. Therefore I have
to start posting updated versions here.

Attached is updated version that sets default value for filler columns.
This trick allows significantly shrink network traffic for COPY FROM BINARY.
Absence of filler column in dataflow has failed my original patch in GitHub
pipeline.

I also switched from one huge transaction for COPY FROM BINARY to
"one per scale". This will simplify merge with multi-threaded data load
proposed by Mircea. Unfortunately, it killed possibility to freeze data right
away, which was possible when table truncation and data load was done
in the same transaction.

I think it would be fair to leave all original modes of data generation
until official review in CommitFest. Hence "INSERT SELECT FROM UNNEST"
is staying so far as there might be interest in community for benchmarking
of columnar tables (eg, for OLAP loads or Timescale DB).

Best regards,
Boris

Attachment	Content-Type	Size
pgbench.c.diff	application/octet-stream	32.3 KB

In response to

Re: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY) at 2025-11-17 12:43:52 from Boris Mironov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2025-11-21 13:50:03	more C99 cleanup
Previous Message	Álvaro Herrera	2025-11-21 13:24:35	Re: Use strtoi64() in pgbench, replacing its open-coded implementation