Re: parallel data loading for pgbench -i

From: lakshmi <lakshmigcdac(at)gmail(dot)com>
To: Mircea Cadariu <cadariu(dot)mircea(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: parallel data loading for pgbench -i
Date: 2026-01-19 09:25:43
Message-ID: CAEvyyTj0rEsgcQOQgkARbRPbupHR_mc=TUzHBBLNzd8JByUUTw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Mircea,

I tested the patch on 19devel and it worked well for me.
Before applying it, -j is rejected in pgbench initialization mode as
expected. After applying the patch, pgbench -i -s 100 -j 10 runs
successfully and shows a clear speedup.
On my system the total runtime dropped to about 9.6s, with client-side data
generation around 3.3s.
I also checked correctness after the run — row counts for pgbench_accounts,
pgbench_branches, and pgbench_tellers all match the expected values.

Thanks for working on this, the improvement is very noticeable.

Best regards,
lakshmi

On Mon, Jan 19, 2026 at 2:30 PM Mircea Cadariu <cadariu(dot)mircea(at)gmail(dot)com>
wrote:

> Hi,
>
> I propose a patch for speeding up pgbench -i through multithreading.
>
> To enable this, pass -j and then the number of workers you want to use.
>
> Here are some results I got on my laptop:
>
>
> master
>
> ---
>
> -i -s 100
> done in 20.95 s (drop tables 0.00 s, create tables 0.01 s, client-side
> generate 14.51 s, vacuum 0.27 s, primary keys 6.16 s).
>
> -i -s 100 --partitions=10
> done in 29.73 s (drop tables 0.00 s, create tables 0.02 s, client-side
> generate 16.33 s, vacuum 8.72 s, primary keys 4.67 s).
>
>
> patch (-j 10)
>
> ---
>
> -i -s 100 -j 10
> done in 18.64 s (drop tables 0.00 s, create tables 0.01 s, client-side
> generate 5.82 s, vacuum 6.89 s, primary keys 5.93 s).
>
> -i -s 100 -j 10 --partitions=10
> done in 14.66 s (drop tables 0.00 s, create tables 0.01 s, client-side
> generate 8.42 s, vacuum 1.55 s, primary keys 4.68 s).
>
> The speedup is more significant for the partitioned use-case. This is
> because all workers can use COPY FREEZE (thus incurring a lower vacuum
> penalty) because they create their separate partitions.
>
> For the non-partitioned case the speedup is lower, but I observe it
> improves somewhat with larger scale factors. When parallel vacuum
> support is merged, this should further reduce the time.
>
> I'd still need to update docs, tests, better integrate the code with its
> surroundings, and other aspects. Would appreciate any feedback on what I
> have so far though. Thanks!
>
> Kind regards,
>
> Mircea Cadariu
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Soumya S Murali 2026-01-19 09:34:47 Re: 001_password.pl fails with --without-readline
Previous Message Dmitry Dolgov 2026-01-19 09:24:41 Re: [[BUG] pg_stat_statements crashes with var and non-var expressions in IN clause