Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: michael(at)paquier(dot)xyz, david(dot)g(dot)johnston(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2025-08-14 06:36:54
Message-ID: 20250814.153654.91221343186154858.kou@clear-code.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <CAD21AoBa0Wm3C2H12jaqkvLidP2zEhsC+gf=3w7XiA4LQnvx0g(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 28 Jul 2025 22:19:36 -0700,
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> The fields in 1 are mostly static fields, and the fields in 2 and 3
> are likely to be accessed in hot functions during COPY FROM. Would it
> be a good idea to restructure these fields so that we can hide the
> fields in 1 from callback functions and having the fields in 3 in a
> separate format-specific struct that can be accessed via an opaque
> pointer? But could the latter change potentially cause performance
> overheads?

Yes. It changes memory layout (1 continuous memory chunk ->
2 separated memory chunks) and introduces indirect member
accesses (x->y -> x->z->y). They may not have performance
impact but we need to measure it if we want to use this
approach.

BTW, how about the following approach?

copyapi.h:

typedef struct CopyToStateData
{
/* public members */
/* ... */
} CopyToStateData;

copyto.c:

typedef struct CopyToStateInternalData
{
CopyToStateData parent;

/* private members */
/* ... */
} CopyToStateInternalData;

We export CopyToStateData only with public members. We don't
export CopyToStateInternalData that has members only for
built-in formats.

CopyToStateInternalData has the same memory layout as
CopyToStateData. So we can use CopyToStateInternalData as
CopyToStateData.

We use CopyToStateData not CopyToStateInternalData in public
API. We cast CopyToStateData to CopyToStateInternalData when
we need to use private members:

static void
CopySendData(CopyToState cstate, const void *databuf, int datasize)
{
CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
appendBinaryStringInfo(cstate_internal->fe_msgbuf, databuf, datasize);
}

It's still direct member access.

With this approach, we can keep the same memory layout (1
continuous memory chunk) and direct member access. I think
that this approach doesn't have performance impact.

See the attached patch for PoC of this approach.

Drawback: This approach always allocates
CopyToStateInternalData not CopyToStateData. So we need to
allocate needless memories for extensions. But this will
prevent performance regression of built-in formats. Is it
acceptable?

Thanks,
--
kou

Attachment Content-Type Size
0001-Split-CopyToStateData-to-CopyToStateData-and-CopyToS.patch text/x-patch 16.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message 赵宇鹏 (宇彭) 2025-08-14 06:43:34 memory leak in logical WAL sender with pgoutput's cachectx
Previous Message Kirill Reshke 2025-08-14 06:35:15 Re: VM corruption on standby