Re: Make COPY format extendable: Extract COPY TO format implementations

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: michael(at)paquier(dot)xyz, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-01-10 06:33:22
Message-ID: CAD21AoC_dhfS97DKwTL+2nvgBOYrmN9XVYrE8w2SuDgghb-yzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 10, 2024 at 12:00 PM Sutou Kouhei <kou(at)clear-code(dot)com> wrote:
>
> Hi,
>
> In <ZYTfqGppMc9e_w2k(at)paquier(dot)xyz>
> "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 22 Dec 2023 10:00:24 +0900,
> Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> >> 3. Export CopySend*()
> >>
> >> * If we like minimum API, we just need to export
> >> CopySendData() and CopySendEndOfRow(). But
> >> CopySend{String,Char,Int32,Int16}() will be convenient
> >> custom COPY TO handlers. (A custom COPY TO handler for
> >> Apache Arrow doesn't need them.)
> >
> > Hmm. Not sure on this one. This may come down to externalize the
> > manipulation of fe_msgbuf. Particularly, could it be possible that
> > some custom formats don't care at all about the network order?
>
> It means that all custom formats should control byte order
> by themselves instead of using CopySendInt*() that always
> use network byte order, right? It makes sense. Let's export
> only CopySendData() and CopySendEndOfRow().
>
>
> >> 1. What value should be used for "format" in
> >> PgMsg_CopyOutResponse message?
> >>
> >> It's 1 for binary format and 0 for text/csv format.
> >>
> >> Should we make it customizable by custom COPY TO handler?
> >> If so, what value should be used for this?
> >
> > Interesting point. It looks very tempting to give more flexibility to
> > people who'd like to use their own code as we have one byte in the
> > protocol but just use 0/1. Hence it feels natural to have a callback
> > for that.
>
> OK. Let's add a callback something like:
>
> typedef int16 (*CopyToGetFormat_function) (CopyToState cstate);
>
> > It also means that we may want to think harder about copy_is_binary in
> > libpq in the future step. Now, having a backend implementation does
> > not need any libpq bits, either, because a client stack may just want
> > to speak the Postgres protocol directly. Perhaps a custom COPY
> > implementation would be OK with how things are in libpq, as well,
> > tweaking its way through with just text or binary.
>
> Can we defer this discussion after we commit a basic custom
> COPY format handler mechanism?
>
> >> 2. Do we need more tries for design discussion for the first
> >> implementation? If we need, what should we try?
> >
> > A makeNode() is used with an allocation in the current memory context
> > in the function returning the handler. I would have assume that this
> > stuff returns a handler as a const struct like table AMs.
>
> If we use this approach, we can't use the Sawada-san's
> idea[1] that provides a convenient API to hide
> CopyFormatRoutine internal. The idea provides
> MakeCopy{To,From}FormatRoutine(). They return a new
> CopyFormatRoutine* with suitable is_from member. They can't
> use static const CopyFormatRoutine because they may be called
> multiple times in the same process.
>
> We can use the satic const struct approach by choosing one
> of the followings:
>
> 1. Use separated function for COPY {TO,FROM} format handlers
> as I suggested.
>
> 2. Don't provide convenient API. Developers construct
> CopyFormatRoutine by themselves. But it may be a bit
> tricky.
>
> 3. Similar to 2. but don't use a bit tricky approach (don't
> embed Copy{To,From}FormatRoutine nodes into
> CopyFormatRoutine).
>
> Use unified function for COPY {TO,FROM} format handlers
> but CopyFormatRoutine always have both of COPY {TO,FROM}
> format routines and these routines aren't nodes:
>
> typedef struct CopyToFormatRoutine
> {
> CopyToStart_function start_fn;
> CopyToOneRow_function onerow_fn;
> CopyToEnd_function end_fn;
> } CopyToFormatRoutine;
>
> /* XXX: just copied from COPY TO routines */
> typedef struct CopyFromFormatRoutine
> {
> CopyFromStart_function start_fn;
> CopyFromOneRow_function onerow_fn;
> CopyFromEnd_function end_fn;
> } CopyFromFormatRoutine;
>
> typedef struct CopyFormatRoutine
> {
> NodeTag type;
>
> CopyToFormatRoutine to_routine;
> CopyFromFormatRoutine from_routine;
> } CopyFormatRoutine;
>
> ----
>
> static const CopyFormatRoutine testfmt_handler = {
> .type = T_CopyFormatRoutine,
> .to_routine = {
> .start_fn = testfmt_copyto_start,
> .onerow_fn = testfmt_copyto_onerow,
> .end_fn = testfmt_copyto_end,
> },
> .from_routine = {
> .start_fn = testfmt_copyfrom_start,
> .onerow_fn = testfmt_copyfrom_onerow,
> .end_fn = testfmt_copyfrom_end,
> },
> };
>
> PG_FUNCTION_INFO_V1(copy_testfmt_handler);
> Datum
> copy_testfmt_handler(PG_FUNCTION_ARGS)
> {
> PG_RETURN_POINTER(&testfmt_handler);
> }
>
> 4. ... other idea?

It's a just idea but the fourth idea is to provide a convenient macro
to make it easy to construct the CopyFormatRoutine. For example,

#define COPYTO_ROUTINE(...) (Node *) &(CopyToFormatRoutine) {__VA_ARGS__}

static const CopyFormatRoutine testfmt_copyto_handler = {
.type = T_CopyFormatRoutine,
.is_from = true,
.routine = COPYTO_ROUTINE (
.start_fn = testfmt_copyto_start,
.onerow_fn = testfmt_copyto_onerow,
.end_fn = testfmt_copyto_end
)
};

Datum
copy_testfmt_handler(PG_FUNCTION_ARGS)
{
PG_RETURN_POINTER(& testfmt_copyto_handler);
}

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2024-01-10 06:37:01 Re: Parent/child context relation in pg_get_backend_memory_contexts()
Previous Message Andrei Lepikhov 2024-01-10 06:29:30 Re: Custom explain options