Re: Parallel copy

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-09-29 13:00:34
Message-ID: CAA4eK1L-Xgw1zZEbGePmhBBWmEmLFL6rCaiOMDPnq2GNMVz-sg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> Few additional comments:
> ======================

Some more comments:

v5-0002-Framework-for-leader-worker-in-parallel-copy
===========================================
1.
These values
+ * help in handover of multiple records with significant size of data to be
+ * processed by each of the workers to make sure there is no context
switch & the
+ * work is fairly distributed among the workers.

How about writing it as: "These values help in the handover of
multiple records with the significant size of data to be processed by
each of the workers. This also ensures there is no context switch and
the work is fairly distributed among the workers."

2. Can we keep WORKER_CHUNK_COUNT, MAX_BLOCKS_COUNT, and RINGSIZE as
power-of-two? Say WORKER_CHUNK_COUNT as 64, MAX_BLOCK_COUNT as 1024,
and accordingly choose RINGSIZE. At many places, we do that way. I
think it can sometimes help in faster processing due to cache size
requirements and in this case, I don't see a reason why we can't
choose these values to be power-of-two. If you agree with this change
then also do some performance testing after this change?

3.
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+} ParallelCopyDataBlock;

Is there a reason to keep skip_bytes after data? Normally the variable
size data is at the end of the structure. Also, there is no comment
explaining the purpose of skip_bytes.

4.
+ * Copy data block information.
+ * ParallelCopyDataBlock's will be created in DSM. Data read from file will be
+ * copied in these DSM data blocks. The leader process identifies the records
+ * and the record information will be shared to the workers. The workers will
+ * insert the records into the table. There can be one or more number
of records
+ * in each of the data block based on the record size.
+ */
+typedef struct ParallelCopyDataBlock

Keep one empty line after the description line like below. I also
suggested to do a minor tweak in the above sentence which is as
follows:

* Copy data block information.
*
* These data blocks are created in DSM. Data read ...

Try to follow a similar format in other comments as well.

5. I think it is better to move parallelism related code to a new file
(we can name it as copyParallel.c or something like that).

6. copy.c(1648,25): warning C4133: 'function': incompatible types -
from 'ParallelCopyLineState *' to 'uint32 *'
Getting above compilation warning on Windows.

v5-0003-Allow-copy-from-command-to-process-data-from-file
==================================================
1.
@@ -4294,7 +5047,7 @@ BeginCopyFrom(ParseState *pstate,
* only in text mode.
*/
initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *)
palloc(RAW_BUF_SIZE + 1);

Is there anyway IsParallelCopy can be true by this time? AFAICS, we do
anything about parallelism after this. If you want to save this
allocation then we need to move this after we determine that
parallelism can be used or not and accordingly the below code in the
patch needs to be changed.

* ParallelCopyFrom - parallel copy leader's functionality.
*
* Leader executes the before statement for before statement trigger, if before
@@ -1110,8 +1547,302 @@ ParallelCopyFrom(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));

+ /* raw_buf is not used in parallel copy, instead data blocks are used.*/
+ pfree(cstate->raw_buf);
+ cstate->raw_buf = NULL;

Is there anything else also the allocation of which depends on parallelism?

2.
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}

In the comments, we should write why parallelism is not allowed for a
particular case. The cases where parallel-unsafe clause is involved
are okay but it is not clear from comments why it is not allowed in
other cases.

3.
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);

Can we take all the code here inside function UpdateBlockInLineInfo? I
see that it is called from one other place but I guess most of the
surrounding code there can also be moved inside the function. Can we
change the name of the function to UpdateSharedLineInfo or something
like that and remove inline marking from this? I am not sure we want
to inline such big functions. If it make difference in performance
then we can probably consider it.

4.
EndLineParallelCopy()
{
..
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
..
}

Can we instead call UpdateSharedLineInfo (new function name for
UpdateBlockInLineInfo) to do this and maybe see it only updates the
required info? The idea is to centralize the code for updating
SharedLineInfo.

5.
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;

It seems to me that each worker has to hop through all the processed
chunks before getting the chunk which it can process. This will work
but I think it is better if we have some shared counter which can tell
us the next chunk to be processed and avoid all the unnecessary work
of hopping to find the exact position.

v5-0004-Documentation-for-parallel-copy
-----------------------------------------
1. Can you add one or two examples towards the end of the page where
we have examples for other Copy options?

Please run pgindent on all patches as that will make the code look better.

From the testing perspective,
1. Test by having something force_parallel_mode = regress which means
that all existing Copy tests in the regression will be executed via
new worker code. You can have this as a test-only patch for now and
make sure all existing tests passed with this.
2. Do we have tests for toast tables? I think if you implement the
previous point some existing tests might cover it but I feel we should
have at least one or two tests for the same.
3. Have we checked the code coverage of the newly added code with
existing tests?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-09-29 13:16:03 Re: [HACKERS] logical decoding of two-phase transactions
Previous Message Ashutosh Bapat 2020-09-29 12:40:42 Re: Disable WAL logging to speed up data loading