From: | Sutou Kouhei <kou(at)clear-code(dot)com> |
---|---|
To: | sawada(dot)mshk(at)gmail(dot)com |
Cc: | michael(at)paquier(dot)xyz, david(dot)g(dot)johnston(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date: | 2025-08-14 06:36:54 |
Message-ID: | 20250814.153654.91221343186154858.kou@clear-code.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
In <CAD21AoBa0Wm3C2H12jaqkvLidP2zEhsC+gf=3w7XiA4LQnvx0g(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 28 Jul 2025 22:19:36 -0700,
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> The fields in 1 are mostly static fields, and the fields in 2 and 3
> are likely to be accessed in hot functions during COPY FROM. Would it
> be a good idea to restructure these fields so that we can hide the
> fields in 1 from callback functions and having the fields in 3 in a
> separate format-specific struct that can be accessed via an opaque
> pointer? But could the latter change potentially cause performance
> overheads?
Yes. It changes memory layout (1 continuous memory chunk ->
2 separated memory chunks) and introduces indirect member
accesses (x->y -> x->z->y). They may not have performance
impact but we need to measure it if we want to use this
approach.
BTW, how about the following approach?
copyapi.h:
typedef struct CopyToStateData
{
/* public members */
/* ... */
} CopyToStateData;
copyto.c:
typedef struct CopyToStateInternalData
{
CopyToStateData parent;
/* private members */
/* ... */
} CopyToStateInternalData;
We export CopyToStateData only with public members. We don't
export CopyToStateInternalData that has members only for
built-in formats.
CopyToStateInternalData has the same memory layout as
CopyToStateData. So we can use CopyToStateInternalData as
CopyToStateData.
We use CopyToStateData not CopyToStateInternalData in public
API. We cast CopyToStateData to CopyToStateInternalData when
we need to use private members:
static void
CopySendData(CopyToState cstate, const void *databuf, int datasize)
{
CopyToStateInternal cstate_internal = (CopyToStateInternal) cstate;
appendBinaryStringInfo(cstate_internal->fe_msgbuf, databuf, datasize);
}
It's still direct member access.
With this approach, we can keep the same memory layout (1
continuous memory chunk) and direct member access. I think
that this approach doesn't have performance impact.
See the attached patch for PoC of this approach.
Drawback: This approach always allocates
CopyToStateInternalData not CopyToStateData. So we need to
allocate needless memories for extensions. But this will
prevent performance regression of built-in formats. Is it
acceptable?
Thanks,
--
kou
Attachment | Content-Type | Size |
---|---|---|
0001-Split-CopyToStateData-to-CopyToStateData-and-CopyToS.patch | text/x-patch | 16.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | 赵宇鹏 (宇彭) | 2025-08-14 06:43:34 | memory leak in logical WAL sender with pgoutput's cachectx |
Previous Message | Kirill Reshke | 2025-08-14 06:35:15 | Re: VM corruption on standby |