Re: Make COPY format extendable: Extract COPY TO format implementations

From: Sutou Kouhei <kou(at)clear-code(dot)com>
To: michael(at)paquier(dot)xyz
Cc: sawada(dot)mshk(at)gmail(dot)com, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-01-10 03:00:34
Message-ID: 20240110.120034.501385498034538233.kou@clear-code.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In <ZYTfqGppMc9e_w2k(at)paquier(dot)xyz>
"Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Dec 2023 10:00:24 +0900,
Michael Paquier <michael(at)paquier(dot)xyz> wrote:

>> 3. Export CopySend*()
>>
>> * If we like minimum API, we just need to export
>> CopySendData() and CopySendEndOfRow(). But
>> CopySend{String,Char,Int32,Int16}() will be convenient
>> custom COPY TO handlers. (A custom COPY TO handler for
>> Apache Arrow doesn't need them.)
>
> Hmm. Not sure on this one. This may come down to externalize the
> manipulation of fe_msgbuf. Particularly, could it be possible that
> some custom formats don't care at all about the network order?

It means that all custom formats should control byte order
by themselves instead of using CopySendInt*() that always
use network byte order, right? It makes sense. Let's export
only CopySendData() and CopySendEndOfRow().

>> 1. What value should be used for "format" in
>> PgMsg_CopyOutResponse message?
>>
>> It's 1 for binary format and 0 for text/csv format.
>>
>> Should we make it customizable by custom COPY TO handler?
>> If so, what value should be used for this?
>
> Interesting point. It looks very tempting to give more flexibility to
> people who'd like to use their own code as we have one byte in the
> protocol but just use 0/1. Hence it feels natural to have a callback
> for that.

OK. Let's add a callback something like:

typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);

> It also means that we may want to think harder about copy_is_binary in
> libpq in the future step. Now, having a backend implementation does
> not need any libpq bits, either, because a client stack may just want
> to speak the Postgres protocol directly. Perhaps a custom COPY
> implementation would be OK with how things are in libpq, as well,
> tweaking its way through with just text or binary.

Can we defer this discussion after we commit a basic custom
COPY format handler mechanism?

>> 2. Do we need more tries for design discussion for the first
>> implementation? If we need, what should we try?
>
> A makeNode() is used with an allocation in the current memory context
> in the function returning the handler. I would have assume that this
> stuff returns a handler as a const struct like table AMs.

If we use this approach, we can't use the Sawada-san's
idea[1] that provides a convenient API to hide
CopyFormatRoutine internal. The idea provides
MakeCopy{To,From}FormatRoutine(). They return a new
CopyFormatRoutine* with suitable is_from member. They can't
use static const CopyFormatRoutine because they may be called
multiple times in the same process.

We can use the satic const struct approach by choosing one
of the followings:

1. Use separated function for COPY {TO,FROM} format handlers
as I suggested.

2. Don't provide convenient API. Developers construct
CopyFormatRoutine by themselves. But it may be a bit
tricky.

3. Similar to 2. but don't use a bit tricky approach (don't
embed Copy{To,From}FormatRoutine nodes into
CopyFormatRoutine).

Use unified function for COPY {TO,FROM} format handlers
but CopyFormatRoutine always have both of COPY {TO,FROM}
format routines and these routines aren't nodes:

typedef struct CopyToFormatRoutine
{
CopyToStart_function start_fn;
CopyToOneRow_function onerow_fn;
CopyToEnd_function end_fn;
} CopyToFormatRoutine;

/* XXX: just copied from COPY TO routines */
typedef struct CopyFromFormatRoutine
{
CopyFromStart_function start_fn;
CopyFromOneRow_function onerow_fn;
CopyFromEnd_function end_fn;
} CopyFromFormatRoutine;

typedef struct CopyFormatRoutine
{
NodeTag type;

CopyToFormatRoutine to_routine;
CopyFromFormatRoutine from_routine;
} CopyFormatRoutine;

----

static const CopyFormatRoutine testfmt_handler = {
.type = T_CopyFormatRoutine,
.to_routine = {
.start_fn = testfmt_copyto_start,
.onerow_fn = testfmt_copyto_onerow,
.end_fn = testfmt_copyto_end,
},
.from_routine = {
.start_fn = testfmt_copyfrom_start,
.onerow_fn = testfmt_copyfrom_onerow,
.end_fn = testfmt_copyfrom_end,
},
};

PG_FUNCTION_INFO_V1(copy_testfmt_handler);
Datum
copy_testfmt_handler(PG_FUNCTION_ARGS)
{
PG_RETURN_POINTER(&testfmt_handler);
}

4. ... other idea?

[1] https://www.postgresql.org/message-id/flat/CAD21AoDs9cOjuVbA_krGizAdc50KE%2BFjAuEXWF0NZwbMnc7F3Q%40mail.gmail.com#71bb03d9237252382b245dd33e705a3a

Thanks,
--
kou

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sutou Kouhei 2024-01-10 03:06:44 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message ddme 2024-01-10 02:49:20 Is the subtype_diff function in CREATE TYPE only can be C function?