Re: Parallel copy

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-10-03 10:15:53
Message-ID: CAA4eK1L+aJ6O2u1_PEss_9FKgS8fUyWufFpZFk+jHbWvQoDkOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 3, 2020 at 6:20 AM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> Hello Vignesh,
>
> I've done some basic benchmarking on the v4 version of the patches (but
> AFAIKC the v5 should perform about the same), and some initial review.
>
> For the benchmarking, I used the lineitem table from TPC-H - for 75GB
> data set, this largest table is about 64GB once loaded, with another
> 54GB in 5 indexes. This is on a server with 32 cores, 64GB of RAM and
> NVME storage.
>
> The COPY duration with varying number of workers (specified using the
> parallel COPY option) looks like this:
>
> workers duration
> ---------------------
> 0 1366
> 1 1255
> 2 704
> 3 526
> 4 434
> 5 385
> 6 347
> 7 322
> 8 327
>
> So this seems to work pretty well - initially we get almost linear
> speedup, then it slows down (likely due to contention for locks, I/O
> etc.). Not bad.
>

+1. These numbers (> 4x speed up) look good to me.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Petru Ghita 2020-10-03 10:53:35 POC: contrib/unaccent as IMMUTABLE
Previous Message Amit Kapila 2020-10-03 07:55:59 Re: Resetting spilled txn statistics in pg_stat_replication