Re: Parallel copy

From: vignesh C <vignesh21(at)gmail(dot)com>
To: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Parallel copy
Date: 2020-12-07 08:32:22
Message-ID: CALDaNm32c7kWpZm9UkS5P+ShsfRhyMTWKvFHyn9O53WZWvO2iA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the comments.
> I took a look at the v10 patch set. Here are some comments:
>
> 1.
> +/*
> + * CheckExprParallelSafety
> + *
> + * Determine if where cluase and default expressions are parallel safe & do not
> + * have volatile expressions, return true if condition satisfies else return
> + * false.
> + */
>
> 'cluase' seems a typo.
>

changed.

> 2.
> + /*
> + * Make sure that no worker has consumed this element, if this
> + * line is spread across multiple data blocks, worker would have
> + * started processing, no need to change the state to
> + * LINE_LEADER_POPULATING in this case.
> + */
> + (void) pg_atomic_compare_exchange_u32(&lineInfo->line_state,
> + &current_line_state,
> + LINE_LEADER_POPULATED);
> About the commect
>
> + * started processing, no need to change the state to
> + * LINE_LEADER_POPULATING in this case.
>
> Does it means no need to change the state to LINE_LEADER_POPULATED ' here?
>
>

Yes it is LINE_LEADER_POPULATED, changed accordingly.

> 3.
> + * 3) only one worker should choose one line for processing, this is handled by
> + * using pg_atomic_compare_exchange_u32, worker will change the state to
> + * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
>
> In the latest patch, it will set the state to LINE_WORKER_PROCESSING if line_state is LINE_LEADER_POPULATED or LINE_LEADER_POPULATING.
> So The comment here seems wrong.
>

Updated the comments.

> 4.
> A suggestion for CacheLineInfo.
>
> It use appendBinaryStringXXX to store the line in memory.
> appendBinaryStringXXX will double the str memory when there is no enough spaces.
>
> How about call enlargeStringInfo in advance, if we already know the whole line size?
> It can avoid some memory waste and may impove a little performance.
>

Here we will not know the size beforehand, in some cases we will start
processing the data when current block is populated and keep
processing block by block, we will come to know of the size at the
end. We cannot use enlargeStringInfo because of this.

Attached v11 patch has the fix for this, it also includes the changes
to rebase on top of head.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v11-0001-Copy-code-readjustment-to-support-parallel-copy.patch text/x-patch 22.7 KB
v11-0002-Check-if-parallel-copy-can-be-performed.patch text/x-patch 21.5 KB
v11-0003-Allow-copy-from-command-to-process-data-from-fil.patch text/x-patch 76.1 KB
v11-0004-Documentation-for-parallel-copy.patch text/x-patch 2.7 KB
v11-0005-Parallel-Copy-For-Binary-Format-Files.patch text/x-patch 23.7 KB
v11-0006-Tests-for-parallel-copy.patch text/x-patch 34.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-12-07 08:51:11 Re: Single transaction in the tablesync worker?
Previous Message vignesh C 2020-12-07 08:26:50 Added missing copy related data structures to typedefs.list