Quick Links

Re: parallel data loading for pgbench -i

From:	lakshmi <lakshmigcdac(at)gmail(dot)com>
To:	Mircea Cadariu <cadariu(dot)mircea(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: parallel data loading for pgbench -i
Date:	2026-01-19 09:25:43
Message-ID:	CAEvyyTj0rEsgcQOQgkARbRPbupHR_mc=TUzHBBLNzd8JByUUTw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Mircea,

I tested the patch on 19devel and it worked well for me.
Before applying it, -j is rejected in pgbench initialization mode as
expected. After applying the patch, pgbench -i -s 100 -j 10 runs
successfully and shows a clear speedup.
On my system the total runtime dropped to about 9.6s, with client-side data
generation around 3.3s.
I also checked correctness after the run — row counts for pgbench_accounts,
pgbench_branches, and pgbench_tellers all match the expected values.

Thanks for working on this, the improvement is very noticeable.

Best regards,
lakshmi

On Mon, Jan 19, 2026 at 2:30 PM Mircea Cadariu <cadariu(dot)mircea(at)gmail(dot)com>
wrote:

> Hi,
>
> I propose a patch for speeding up pgbench -i through multithreading.
>
> To enable this, pass -j and then the number of workers you want to use.
>
> Here are some results I got on my laptop:
>
>
> master
>
> ---
>
> -i -s 100
> done in 20.95 s (drop tables 0.00 s, create tables 0.01 s, client-side
> generate 14.51 s, vacuum 0.27 s, primary keys 6.16 s).
>
> -i -s 100 --partitions=10
> done in 29.73 s (drop tables 0.00 s, create tables 0.02 s, client-side
> generate 16.33 s, vacuum 8.72 s, primary keys 4.67 s).
>
>
> patch (-j 10)
>
> ---
>
> -i -s 100 -j 10
> done in 18.64 s (drop tables 0.00 s, create tables 0.01 s, client-side
> generate 5.82 s, vacuum 6.89 s, primary keys 5.93 s).
>
> -i -s 100 -j 10 --partitions=10
> done in 14.66 s (drop tables 0.00 s, create tables 0.01 s, client-side
> generate 8.42 s, vacuum 1.55 s, primary keys 4.68 s).
>
> The speedup is more significant for the partitioned use-case. This is
> because all workers can use COPY FREEZE (thus incurring a lower vacuum
> penalty) because they create their separate partitions.
>
> For the non-partitioned case the speedup is lower, but I observe it
> improves somewhat with larger scale factors. When parallel vacuum
> support is merged, this should further reduce the time.
>
> I'd still need to update docs, tests, better integrate the code with its
> surroundings, and other aspects. Would appreciate any feedback on what I
> have so far though. Thanks!
>
> Kind regards,
>
> Mircea Cadariu
>
>

In response to

parallel data loading for pgbench -i at 2025-11-17 12:46:12 from Mircea Cadariu

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Soumya S Murali	2026-01-19 09:34:47	Re: 001_password.pl fails with --without-readline
Previous Message	Dmitry Dolgov	2026-01-19 09:24:41	Re: [[BUG] pg_stat_statements crashes with var and non-var expressions in IN clause