| From: | lakshmi <lakshmigcdac(at)gmail(dot)com> |
|---|---|
| To: | Mircea Cadariu <cadariu(dot)mircea(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: parallel data loading for pgbench -i |
| Date: | 2026-01-19 09:25:43 |
| Message-ID: | CAEvyyTj0rEsgcQOQgkARbRPbupHR_mc=TUzHBBLNzd8JByUUTw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Mircea,
I tested the patch on 19devel and it worked well for me.
Before applying it, -j is rejected in pgbench initialization mode as
expected. After applying the patch, pgbench -i -s 100 -j 10 runs
successfully and shows a clear speedup.
On my system the total runtime dropped to about 9.6s, with client-side data
generation around 3.3s.
I also checked correctness after the run — row counts for pgbench_accounts,
pgbench_branches, and pgbench_tellers all match the expected values.
Thanks for working on this, the improvement is very noticeable.
Best regards,
lakshmi
On Mon, Jan 19, 2026 at 2:30 PM Mircea Cadariu <cadariu(dot)mircea(at)gmail(dot)com>
wrote:
> Hi,
>
> I propose a patch for speeding up pgbench -i through multithreading.
>
> To enable this, pass -j and then the number of workers you want to use.
>
> Here are some results I got on my laptop:
>
>
> master
>
> ---
>
> -i -s 100
> done in 20.95 s (drop tables 0.00 s, create tables 0.01 s, client-side
> generate 14.51 s, vacuum 0.27 s, primary keys 6.16 s).
>
> -i -s 100 --partitions=10
> done in 29.73 s (drop tables 0.00 s, create tables 0.02 s, client-side
> generate 16.33 s, vacuum 8.72 s, primary keys 4.67 s).
>
>
> patch (-j 10)
>
> ---
>
> -i -s 100 -j 10
> done in 18.64 s (drop tables 0.00 s, create tables 0.01 s, client-side
> generate 5.82 s, vacuum 6.89 s, primary keys 5.93 s).
>
> -i -s 100 -j 10 --partitions=10
> done in 14.66 s (drop tables 0.00 s, create tables 0.01 s, client-side
> generate 8.42 s, vacuum 1.55 s, primary keys 4.68 s).
>
> The speedup is more significant for the partitioned use-case. This is
> because all workers can use COPY FREEZE (thus incurring a lower vacuum
> penalty) because they create their separate partitions.
>
> For the non-partitioned case the speedup is lower, but I observe it
> improves somewhat with larger scale factors. When parallel vacuum
> support is merged, this should further reduce the time.
>
> I'd still need to update docs, tests, better integrate the code with its
> surroundings, and other aspects. Would appreciate any feedback on what I
> have so far though. Thanks!
>
> Kind regards,
>
> Mircea Cadariu
>
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Soumya S Murali | 2026-01-19 09:34:47 | Re: 001_password.pl fails with --without-readline |
| Previous Message | Dmitry Dolgov | 2026-01-19 09:24:41 | Re: [[BUG] pg_stat_statements crashes with var and non-var expressions in IN clause |