Re: Parallel copy

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-10-29 06:15:51
Message-ID: CAA4eK1K30X9DpbNrrp2GccFNJNCKfeoXoBT-f=qU5-ApRhAsuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 27, 2020 at 7:06 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
[latest version]

I think the parallel-safety checks in this patch
(v9-0002-Allow-copy-from-command-to-process-data-from-file) are
incomplete and wrong. See below comments.
1.
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *) cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }

I don't understand the above check. Why do we only need to check where
clause for parallel-safety when it contains volatile functions? It
should be checked otherwise as well, no? The similar comment applies
to other checks in this function. Also, I don't think there is a need
to make this function inline.

2.
+/*
+ * IsParallelCopyAllowed
+ *
+ * Check if parallel copy can be allowed.
+ */
+bool
+IsParallelCopyAllowed(CopyState cstate)
{
..
+ * When there are BEFORE/AFTER/INSTEAD OF row triggers on the table. We do
+ * not allow parallelism in such cases because such triggers might query
+ * the table we are inserting into and act differently if the tuples that
+ * have already been processed and prepared for insertion are not there.
+ * Now, if we allow parallelism with such triggers the behaviour would
+ * depend on if the parallel worker has already inserted or not that
+ * particular tuples.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_new_table ||
+ cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_after_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return false;
..

Why do we need to disable parallelism for before/after row triggers
unless they have parallel-unsafe functions? I see a few lines down in
this function you are checking parallel-safety of trigger functions,
what is the use of the same if you are already disabling parallelism
with the above check.

3. What about if the index on table has expressions that are
parallel-unsafe? What is your strategy to check parallel-safety for
partitioned tables?

I suggest checking Greg's patch for parallel-safety of Inserts [1]. I
think you will find that most of those checks are required here as
well and see how we can use that patch (at least what is common). I
feel the first patch should be just to have parallel-safety checks and
we can test that by trying to enable Copy with force_parallel_mode. We
can build the rest of the patch atop of it or in other words, let's
move all parallel-safety work into a separate patch.

Few assorted comments:
========================
1.
+/*
+ * ESTIMATE_NODE_SIZE - Estimate the size required for node type in shared
+ * memory.
+ */
+#define ESTIMATE_NODE_SIZE(list, listStr, strsize) \
+{ \
+ uint32 estsize = sizeof(uint32); \
+ if ((List *)list != NIL) \
+ { \
+ listStr = nodeToString(list); \
+ estsize += strlen(listStr) + 1; \
+ } \
+ \
+ strsize = add_size(strsize, estsize); \
+}

This can be probably a function instead of a macro.

2.
+/*
+ * ESTIMATE_1BYTE_STR_SIZE - Estimate the size required for 1Byte strings in
+ * shared memory.
+ */
+#define ESTIMATE_1BYTE_STR_SIZE(src, strsize) \
+{ \
+ strsize = add_size(strsize, sizeof(uint8)); \
+ strsize = add_size(strsize, (src) ? 1 : 0); \
+}

This could be an inline function.

3.
+/*
+ * SERIALIZE_1BYTE_STR - Copy 1Byte strings to shared memory.
+ */
+#define SERIALIZE_1BYTE_STR(dest, src, copiedsize) \
+{ \
+ uint8 len = (src) ? 1 : 0; \
+ memcpy(dest + copiedsize, (uint8 *) &len, sizeof(uint8)); \
+ copiedsize += sizeof(uint8); \
+ if (src) \
+ dest[copiedsize++] = src[0]; \
+}

Similarly, this could be a function. I think keeping such things as
macros in-between code makes it difficult to read. Please see if you
can make these and similar macros as functions unless they are doing
few memory instructions. Having functions makes it easier to debug the
code as well.

[1] - https://www.postgresql.org/message-id/CAJcOf-cgfjj0NfYPrNFGmQJxsnNW102LTXbzqxQJuziar1EKfQ%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-10-29 06:21:57 Re: Use standard SIGHUP and SIGTERM handlers in autoprewarm module
Previous Message Craig Ringer 2020-10-29 06:10:56 Re: Internal key management system