Quick Links

Re: Make COPY format extendable: Extract COPY TO format implementations

From:	Sutou Kouhei <kou(at)clear-code(dot)com>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	michael(at)paquier(dot)xyz, david(dot)g(dot)johnston(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Make COPY format extendable: Extract COPY TO format implementations
Date:	2025-08-14 06:36:54
Message-ID:	20250814.153654.91221343186154858.kou@clear-code.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

In <CAD21AoBa0Wm3C2H12jaqkvLidP2zEhsC+gf=3w7XiA4LQnvx0g(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 28 Jul 2025 22:19:36 -0700,
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> The fields in 1 are mostly static fields, and the fields in 2 and 3
> are likely to be accessed in hot functions during COPY FROM. Would it
> be a good idea to restructure these fields so that we can hide the
> fields in 1 from callback functions and having the fields in 3 in a
> separate format-specific struct that can be accessed via an opaque
> pointer? But could the latter change potentially cause performance
> overheads?

Yes. It changes memory layout (1 continuous memory chunk ->
2 separated memory chunks) and introduces indirect member
accesses (x->y -> x->z->y). They may not have performance
impact but we need to measure it if we want to use this
approach.

BTW, how about the following approach?

copyapi.h:

typedef struct CopyToStateData
{
/* public members */
/* ... */
} CopyToStateData;

copyto.c:

typedef struct CopyToStateInternalData
{
CopyToStateData parent;

/* private members */
/* ... */
} CopyToStateInternalData;

We export CopyToStateData only with public members. We don't
export CopyToStateInternalData that has members only for
built-in formats.

CopyToStateInternalData has the same memory layout as
CopyToStateData. So we can use CopyToStateInternalData as
CopyToStateData.

We use CopyToStateData not CopyToStateInternalData in public
API. We cast CopyToStateData to CopyToStateInternalData when
we need to use private members:

static void
CopySendData(CopyToState cstate, const void *databuf, int datasize)
{
CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
appendBinaryStringInfo(cstate_internal->fe_msgbuf, databuf, datasize);
}

It's still direct member access.

With this approach, we can keep the same memory layout (1
continuous memory chunk) and direct member access. I think
that this approach doesn't have performance impact.

See the attached patch for PoC of this approach.

Drawback: This approach always allocates
CopyToStateInternalData not CopyToStateData. So we need to
allocate needless memories for extensions. But this will
prevent performance regression of built-in formats. Is it
acceptable?

Thanks,
--
kou

Attachment	Content-Type	Size
0001-Split-CopyToStateData-to-CopyToStateData-and-CopyToS.patch	text/x-patch	16.3 KB

In response to

Re: Make COPY format extendable: Extract COPY TO format implementations at 2025-07-29 05:19:36 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	赵宇鹏 (宇彭)	2025-08-14 06:43:34	memory leak in logical WAL sender with pgoutput's cachectx
Previous Message	Kirill Reshke	2025-08-14 06:35:15	Re: VM corruption on standby