Re: parallel data loading for pgbench -i

From: lakshmi <lakshmigcdac(at)gmail(dot)com>
To: Mircea Cadariu <cadariu(dot)mircea(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "tomas(at)vondra(dot)me" <tomas(at)vondra(dot)me>
Subject: Re: parallel data loading for pgbench -i
Date: 2026-04-13 06:14:18
Message-ID: CAEvyyTjt1_QXO_37h1hbVqWdONm+uopV74j3K2pS5VrLKmozsw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Mircea, Heikki,

I tested the v3 patch on 19devel with larger scale factors.

The behavior looks much better now compared to the earlier versions. For
scale 100 and 500, I see clear improvements in overall runtime, and for
scale 2000, the total time is around 97s on my system.

The loading phase now runs concurrently across workers, and I don’t see the
earlier serialization behavior anymore.

The VACUUM phase also remains relatively small (~6s for scale 2000), which
suggests that the previous overhead has been addressed.

I also verified correctness, and the row counts match the expected values.

Overall, the partitioned parallel approach looks solid and scales well in
my tests.

Thanks again for the work on this.

Best regards,
Lakshmi

On Sat, Apr 11, 2026 at 12:07 AM Mircea Cadariu <cadariu(dot)mircea(at)gmail(dot)com>
wrote:

> Hi,
>
> On 07/04/2026 10:00, Heikki Linnakangas wrote:
> >
> > This all makes more sense in the partitioned case. Perhaps we should
> > parallelize only when partitioned are used, and use only one thread
> > per partition.
> >
> Thanks for having a look. I attached v3 that parallelizes only the
> partitioned case, one thread per partition. Results:
>
>
> patch:
>
> pgbench -i -s 100 --partitions 10
>
> done in 12.63 s (drop tables 0.05 s, create tables 0.01 s, client-side
> generate 5.98 s, vacuum 1.63 s, primary keys 4.96 s).
>
>
> master:
>
> pgbench -i -s 100 --partitions 10
>
> done in 29.29 s (drop tables 0.00 s, create tables 0.02 s, client-side
> generate 16.31 s, vacuum 7.78 s, primary keys 5.18 s).
>
> --
> Thanks,
> Mircea Cadariu
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexandre Felipe 2026-04-13 06:22:22 Re: SLOPE - Planner optimizations on monotonic expressions.
Previous Message vignesh C 2026-04-13 05:53:39 Re: Support EXCEPT for ALL SEQUENCES publications