Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: michael(at)paquier(dot)xyz, david(dot)g(dot)johnston(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, zhjwpku(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2025-09-11 05:46:27
Message-ID: 20250911.144627.1367548062212866518.kou@clear-code.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <CAD21AoBb3t7EcsjYT4w68p9OfMNwWTYsbSVaSRY6DRhi7sNRFg(at)mail(dot)gmail(dot)com>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 10 Sep 2025 00:36:38 -0700,
Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> How about another idea like we move format-specific data to another
> struct that embeds CopyFrom/ToStateData at the first field and have
> CopyFrom/ToStart callback return memory with the size of that
> struct?It resolves the concerns about adding an extra indirection
> layer and extensions doesn't need to allocate memory for unnecessary
> fields (used only for built-in formats). While extensions can access
> the internal fields I think we can live with that given that there are
> some similar precedents such as table AM's scan descriptions.

The another idea looks like the following, right?

struct CopyToStateBuiltInData
{
struct CopyToStateData parent;

/* Members for built-in formats */
...;
}

typedef CopyToState *(*CopyToStart) (void);

CopyToState
BeginCopyTo(..., CopyToStart copy_to_start)
{
...;

/* Allocate workspace and zero all fields */
cstate = copy_to_start();
...;
}

This idea will almost work. But we can't know which
CopyToStart should be used before we parse "FORMAT" option
of COPY.

If we can iterate options twice in BeginCopy{To,From}(), we
can know it. For example:

BeginCopyTo(...)
{
...;

CopyToStart copy_to_start = NULL;
foreach(option, options)
{
DefElem *defel = lfirst_node(DefElem, option);

if (strcmp(defel->defname, "format") == 0)
{
char *fmt = defGetString(defel);
if (strcmp(fmt, "text") == 0 ||
strcmp(fmt, "csv") == 0 ||
strcmp(fmt, "binary") == 0) {
/* Use the builtin cstate */
} else {
copy_to_start = /* Detect CopyToStart for custom format */;
}
}
}
if (copy_to_start)
cstate = copy_to_start();
else
cstate = (CopyToStateData *) palloc0(sizeof(CopyToStateBuiltInData));
...;
}

(It may be better that we add
Copy{To,From}Routine::Copy{To,From}Allocate() instead of
CopyToStart callback.)

I think that this is acceptable because this must be a light
process. This must not have negative performance impact.

Thanks,
--
kou

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-09-11 05:53:01 Re: someone else to do the list of acknowledgments
Previous Message Corey Huinker 2025-09-11 05:44:49 Re: someone else to do the list of acknowledgments