Re: Parallel copy

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-10-14 13:29:48
Message-ID: CALDaNm0_zUa9+S=pwCz3Yp43SY3r9bnO4v-9ucXUujEE=0Sd7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 3, 2020 at 6:20 AM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:
>
> Hello Vignesh,
>
> I've done some basic benchmarking on the v4 version of the patches (but
> AFAIKC the v5 should perform about the same), and some initial review.
>
> For the benchmarking, I used the lineitem table from TPC-H - for 75GB
> data set, this largest table is about 64GB once loaded, with another
> 54GB in 5 indexes. This is on a server with 32 cores, 64GB of RAM and
> NVME storage.
>
> The COPY duration with varying number of workers (specified using the
> parallel COPY option) looks like this:
>
> workers duration
> ---------------------
> 0 1366
> 1 1255
> 2 704
> 3 526
> 4 434
> 5 385
> 6 347
> 7 322
> 8 327
>
> So this seems to work pretty well - initially we get almost linear
> speedup, then it slows down (likely due to contention for locks, I/O
> etc.). Not bad.

Thanks for testing with different workers & posting the results.

> I've only done a quick review, but overall the patch looks in fairly
> good shape.
>
> 1) I don't quite understand why we need INCREMENTPROCESSED and
> RETURNPROCESSED, considering it just does ++ or return. It just
> obfuscated the code, I think.
>

I have removed the macros.

> 2) I find it somewhat strange that BeginParallelCopy can just decide not
> to do parallel copy after all. Why not to do this decisions in the
> caller? Or maybe it's fine this way, not sure.
>

I have moved the check IsParallelCopyAllowed to the caller.

> 3) AFAIK we don't modify typedefs.list in patches, so these changes
> should be removed.
>

I had seen that in many of the commits typedefs.list is getting changed,
also it helps in running pgindent. So I'm retaining this change.

> 4) IsTriggerFunctionParallelSafe actually checks all triggers, not just
> one, so the comment needs minor rewording.
>

Modified the comments.

Thanks for the comments & sharing the test results Tomas, These changes are
fixed in one of my earlier mail [1] that I sent.

[1]
https://www.postgresql.org/message-id/CALDaNm1n1xW43neXSGs%3Dc7zt-mj%2BJHHbubWBVDYT9NfCoF8TuQ%40mail.gmail.com

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2020-10-14 13:41:57 Re: Parallel copy
Previous Message vignesh C 2020-10-14 13:20:50 Re: Parallel copy