Re: Parallel copy

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Subject: Re: Parallel copy
Date: 2020-08-27 12:12:49
Message-ID: CAA4eK1+FDd=yH=YdvzCJxRCZjFRP-5iV73B83=1uSnwxaO2STw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 27, 2020 at 4:56 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Thu, Aug 27, 2020 at 8:24 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> > >
> > > > I have attached new set of patches with the fixes.
> > > > Thoughts?
> > >
> > > Hi Vignesh,
> > >
> > > I don't really have any further comments on the code, but would like
> > > to share some results of some Parallel Copy performance tests I ran
> > > (attached).
> > >
> > > The tests loaded a 5GB CSV data file into a 100 column table (of
> > > different data types). The following were varied as part of the test:
> > > - Number of workers (1 – 10)
> > > - No indexes / 4-indexes
> > > - Default settings / increased resources (shared_buffers,work_mem, etc.)
> > >
> > > (I did not do any partition-related tests as I believe those type of
> > > tests were previously performed)
> > >
> > > I built Postgres (latest OSS code) with the latest Parallel Copy patches (v4).
> > > The test system was a 32-core Intel Xeon E5-4650 server with 378GB of RAM.
> > >
> > >
> > > I observed the following trends:
> > > - For the data file size used, Parallel Copy achieved best performance
> > > using about 9 – 10 workers. Larger data files may benefit from using
> > > more workers. However, I couldn’t really see any better performance,
> > > for example, from using 16 workers on a 10GB CSV data file compared to
> > > using 8 workers. Results may also vary depending on machine
> > > characteristics.
> > > - Parallel Copy with 1 worker ran slower than normal Copy in a couple
> > > of cases (I did question if allowing 1 worker was useful in my patch
> > > review).
> >
> > I think the reason is that for 1 worker case there is not much
> > parallelization as a leader doesn't perform the actual load work.
> > Vignesh, can you please once see if the results are reproducible at
> > your end, if so, we can once compare the perf profiles to see why in
> > some cases we get improvement and in other cases not. Based on that we
> > can decide whether to allow the 1 worker case or not.
> >
>
> I will spend some time on this and update.
>

Thanks.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Surafel Temesgen 2020-08-27 12:31:15 Evaluate expression at planning time for two more cases
Previous Message Drouvot, Bertrand 2020-08-27 11:58:38 Re: Add Information during standby recovery conflicts