Re: Parallel copy

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-10-08 03:12:28
Message-ID: CAJcOf-fwxtoyOSKrGjybZtsU9Qx1sjHaM6gObUJqOZYC_maHgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 8, 2020 at 5:44 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:

> Attached v6 patch with the fixes.
>

Hi Vignesh,

I noticed a couple of issues when scanning the code in the following patch:

v6-0003-Allow-copy-from-command-to-process-data-from-file.patch

In the following code, it will put a junk uint16 value into *destptr
(and thus may well cause a crash) on a Big Endian architecture
(Solaris Sparc, s390x, etc.):
You're storing a (uint16) string length in a uint32 and then pulling
out the lower two bytes of the uint32 and copying them into the
location pointed to by destptr.

static void
+CopyStringToSharedMemory(CopyState cstate, char *srcPtr, char *destptr,
+ uint32 *copiedsize)
+{
+ uint32 len = srcPtr ? strlen(srcPtr) + 1 : 0;
+
+ memcpy(destptr, (uint16 *) &len, sizeof(uint16));
+ *copiedsize += sizeof(uint16);
+ if (len)
+ {
+ memcpy(destptr + sizeof(uint16), srcPtr, len);
+ *copiedsize += len;
+ }
+}

I suggest you change the code to:

uint16 len = srcPtr ? (uint16)strlen(srcPtr) + 1 : 0;
memcpy(destptr, &len, sizeof(uint16));

[I assume string length here can't ever exceed (65535 - 1), right?]

Looking a bit deeper into this, I'm wondering if in fact your
EstimateStringSize() and EstimateNodeSize() functions should be using
BUFFERALIGN() for EACH stored string/node (rather than just calling
shm_toc_estimate_chunk() once at the end, after the length of packed
strings and nodes has been estimated), to ensure alignment of start of
each string/node. Other Postgres code appears to be aligning each
stored chunk using shm_toc_estimate_chunk(). See the definition of
that macro and its current usages.

Then you could safely use:

uint16 len = srcPtr ? (uint16)strlen(srcPtr) + 1 : 0;
*(uint16 *)destptr = len;
*copiedsize += sizeof(uint16);
if (len)
{
memcpy(destptr + sizeof(uint16), srcPtr, len);
*copiedsize += len;
}

and in the CopyStringFromSharedMemory() function, then could safely use:

len = *(uint16 *)srcPtr;

The compiler may be smart enough to optimize-away the memcpy() in this
case anyway, but there are issues in doing this for architectures that
take a performance hit for unaligned access, or don't support
unaligned access.

Also, in CopyXXXXFromSharedMemory() functions, you should use palloc()
instead of palloc0(), as you're filling the entire palloc'd buffer
anyway, so no need to ask for additional MemSet() of all buffer bytes
to 0 prior to memcpy().

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hou, Zhijie 2020-10-08 04:12:37 RE: [PATCH] Keeps tracking the uniqueness with UniqueKey
Previous Message tsunakawa.takay@fujitsu.com 2020-10-08 02:45:06 RE: [Patch] Optimize dropping of relation buffers using dlist