From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | vignesh C <vignesh21(at)gmail(dot)com> |
Cc: | Greg Nancarrow <gregn4422(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Subject: | Re: Parallel copy |
Date: | 2020-08-27 12:12:49 |
Message-ID: | CAA4eK1+FDd=yH=YdvzCJxRCZjFRP-5iV73B83=1uSnwxaO2STw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Aug 27, 2020 at 4:56 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Thu, Aug 27, 2020 at 8:24 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> > >
> > > > I have attached new set of patches with the fixes.
> > > > Thoughts?
> > >
> > > Hi Vignesh,
> > >
> > > I don't really have any further comments on the code, but would like
> > > to share some results of some Parallel Copy performance tests I ran
> > > (attached).
> > >
> > > The tests loaded a 5GB CSV data file into a 100 column table (of
> > > different data types). The following were varied as part of the test:
> > > - Number of workers (1 – 10)
> > > - No indexes / 4-indexes
> > > - Default settings / increased resources (shared_buffers,work_mem, etc.)
> > >
> > > (I did not do any partition-related tests as I believe those type of
> > > tests were previously performed)
> > >
> > > I built Postgres (latest OSS code) with the latest Parallel Copy patches (v4).
> > > The test system was a 32-core Intel Xeon E5-4650 server with 378GB of RAM.
> > >
> > >
> > > I observed the following trends:
> > > - For the data file size used, Parallel Copy achieved best performance
> > > using about 9 – 10 workers. Larger data files may benefit from using
> > > more workers. However, I couldn’t really see any better performance,
> > > for example, from using 16 workers on a 10GB CSV data file compared to
> > > using 8 workers. Results may also vary depending on machine
> > > characteristics.
> > > - Parallel Copy with 1 worker ran slower than normal Copy in a couple
> > > of cases (I did question if allowing 1 worker was useful in my patch
> > > review).
> >
> > I think the reason is that for 1 worker case there is not much
> > parallelization as a leader doesn't perform the actual load work.
> > Vignesh, can you please once see if the results are reproducible at
> > your end, if so, we can once compare the perf profiles to see why in
> > some cases we get improvement and in other cases not. Based on that we
> > can decide whether to allow the 1 worker case or not.
> >
>
> I will spend some time on this and update.
>
Thanks.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Surafel Temesgen | 2020-08-27 12:31:15 | Evaluate expression at planning time for two more cases |
Previous Message | Drouvot, Bertrand | 2020-08-27 11:58:38 | Re: Add Information during standby recovery conflicts |