| From: | Boris Mironov <boris_mironov(at)outlook(dot)com> |
|---|---|
| To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
| Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Idea to enhance pgbench by more modes to generate data (multi-TXNs, UNNEST, COPY BINARY) |
| Date: | 2025-11-21 13:26:05 |
| Message-ID: | PH0PR08MB702059D610C7D84594CD3BB388D5A@PH0PR08MB7020.namprd08.prod.outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Ashutosh,
Just wanted to let you know that I've submitted this patch
to CommitFest (see https://commitfest.postgresql.org/patch/6245/)
Interestingly enough there is one more patch from Mircea Cadariu in the same
CommitFest about pgbench (https://commitfest.postgresql.org/patch/6242/)
That patch has been submitted few days ago and is proposing to run
data generation phase in parallel threads. It shows significant
improvements over performance of original single-thread code.
Hopefully sooner or later pgbench will get significant performance
gains in data generation from these two patches.
Original version of my patch failed in GitHub tests. Therefore I have
to start posting updated versions here.
Attached is updated version that sets default value for filler columns.
This trick allows significantly shrink network traffic for COPY FROM BINARY.
Absence of filler column in dataflow has failed my original patch in GitHub
pipeline.
I also switched from one huge transaction for COPY FROM BINARY to
"one per scale". This will simplify merge with multi-threaded data load
proposed by Mircea. Unfortunately, it killed possibility to freeze data right
away, which was possible when table truncation and data load was done
in the same transaction.
I think it would be fair to leave all original modes of data generation
until official review in CommitFest. Hence "INSERT SELECT FROM UNNEST"
is staying so far as there might be interest in community for benchmarking
of columnar tables (eg, for OLAP loads or Timescale DB).
Best regards,
Boris
| Attachment | Content-Type | Size |
|---|---|---|
| pgbench.c.diff | application/octet-stream | 32.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Eisentraut | 2025-11-21 13:50:03 | more C99 cleanup |
| Previous Message | Álvaro Herrera | 2025-11-21 13:24:35 | Re: Use strtoi64() in pgbench, replacing its open-coded implementation |