Re: Parallel copy

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Subject: Re: Parallel copy
Date: 2020-08-27 11:26:45
Message-ID: CALDaNm2EqwK8HggYXLv-Lz5CZKgU6cQWT_GC9C0YQipJPO=0cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 27, 2020 at 8:24 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> >
> > > I have attached new set of patches with the fixes.
> > > Thoughts?
> >
> > Hi Vignesh,
> >
> > I don't really have any further comments on the code, but would like
> > to share some results of some Parallel Copy performance tests I ran
> > (attached).
> >
> > The tests loaded a 5GB CSV data file into a 100 column table (of
> > different data types). The following were varied as part of the test:
> > - Number of workers (1 – 10)
> > - No indexes / 4-indexes
> > - Default settings / increased resources (shared_buffers,work_mem, etc.)
> >
> > (I did not do any partition-related tests as I believe those type of
> > tests were previously performed)
> >
> > I built Postgres (latest OSS code) with the latest Parallel Copy patches (v4).
> > The test system was a 32-core Intel Xeon E5-4650 server with 378GB of RAM.
> >
> >
> > I observed the following trends:
> > - For the data file size used, Parallel Copy achieved best performance
> > using about 9 – 10 workers. Larger data files may benefit from using
> > more workers. However, I couldn’t really see any better performance,
> > for example, from using 16 workers on a 10GB CSV data file compared to
> > using 8 workers. Results may also vary depending on machine
> > characteristics.
> > - Parallel Copy with 1 worker ran slower than normal Copy in a couple
> > of cases (I did question if allowing 1 worker was useful in my patch
> > review).
>
> I think the reason is that for 1 worker case there is not much
> parallelization as a leader doesn't perform the actual load work.
> Vignesh, can you please once see if the results are reproducible at
> your end, if so, we can once compare the perf profiles to see why in
> some cases we get improvement and in other cases not. Based on that we
> can decide whether to allow the 1 worker case or not.
>

I will spend some time on this and update.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-08-27 11:30:02 Re: Dumping/restoring fails on inherited generated column
Previous Message Kyotaro Horiguchi 2020-08-27 11:24:12 Re: Strange behavior with polygon and NaN