Re: Make COPY format extendable: Extract COPY TO format implementations

From: Hannu Krosing <hannuk(at)google(dot)com>
To: Junwang Zhao <zhjwpku(at)gmail(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Sutou Kouhei <kou(at)clear-code(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "nathandbossart(at)gmail(dot)com" <nathandbossart(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2023-12-09 11:38:46
Message-ID: CAMT0RQRqVo4fGDWHqOn+wr_eoiXQVfyC=8-c=H=y6VcNxi6BvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Junwang

Please also see my presentation slides from last years PostgreSQL
Conference in Berlin (attached)

The main Idea is to make not just "format", but also "transport" and
"stream processing" extendable via virtual function tables.

Btw, will any of you here be in Prague next week ?
Would be a good opportunity to discuss this in person.

Best Regards
Hannu

On Sat, Dec 9, 2023 at 9:39 AM Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:
>
> On Sat, Dec 9, 2023 at 10:43 AM Hayato Kuroda (Fujitsu)
> <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> >
> > Dear Junagn, Sutou-san,
> >
> > Basically I agree your point - improving a extendibility is good.
> > (I remember that this theme was talked at Japan PostgreSQL conference)
> > Below are my comments for your patch.
> >
> > 01. General
> >
> > Just to confirm - is it OK to partially implement APIs? E.g., only COPY TO is
> > available. Currently it seems not to consider a case which is not implemented.
> >
> For partially implements, we can leave the hook as NULL, and check the NULL
> at *ProcessCopyOptions* and report error if not supported.
>
> > 02. General
> >
> > It might be trivial, but could you please clarify how users can extend? Is it OK
> > to do below steps?
> >
> > 1. Create a handler function, via CREATE FUNCTION,
> > 2. Register a handler, via new SQL (CREATE COPY HANDLER),
> > 3. Specify the added handler as COPY ... FORMAT clause.
> >
> My original thought was option 2, but as Michael point, option 1 is
> the right way
> to go.
>
> > 03. General
> >
> > Could you please add document-related tasks to your TODO? I imagined like
> > fdwhandler.sgml.
> >
> > 04. General - copyright
> >
> > For newly added files, the below copyright seems sufficient. See applyparallelworker.c.
> >
> > ```
> > * Copyright (c) 2023, PostgreSQL Global Development Group
> > ```
> >
> > 05. src/include/catalog/* files
> >
> > IIUC, 8000 or higher OIDs should be used while developing a patch. src/include/catalog/unused_oids
> > would suggest a candidate which you can use.
>
> Yeah, I will run renumber_oids.pl at last.
>
> >
> > 06. copy.c
> >
> > I felt that we can create files per copying methods, like copy_{text|csv|binary}.c,
> > like indexes.
> > How do other think?
>
> Not sure about this, it seems others have put a lot of effort into
> splitting TO and From.
> Also like to hear from others.
>
> >
> > 07. fmt_to_name()
> >
> > I'm not sure the function is really needed. Can we follow like get_foreign_data_wrapper_oid()
> > and remove the funciton?
>
> I have referenced some code from greenplum, will remove this.
>
> >
> > 08. GetCopyRoutineByName()
> >
> > Should we use syscache for searching a catalog?
> >
> > 09. CopyToFormatTextSendEndOfRow(), CopyToFormatBinaryStart()
> >
> > Comments still refer CopyHandlerOps, whereas it was renamed.
> >
> > 10. copy.h
> >
> > Per foreign.h and fdwapi.h, should we add a new header file and move some APIs?
> >
> > 11. copy.h
> >
> > ```
> > -/* These are private in commands/copy[from|to].c */
> > -typedef struct CopyFromStateData *CopyFromState;
> > -typedef struct CopyToStateData *CopyToState;
> > ```
> >
> > Are above changes really needed?
> >
> > 12. CopyFormatOptions
> >
> > Can we remove `bool binary` in future?
> >
> > 13. external functions
> >
> > ```
> > +extern void CopyToFormatTextStart(CopyToState cstate, TupleDesc tupDesc);
> > +extern void CopyToFormatTextOneRow(CopyToState cstate, TupleTableSlot *slot);
> > +extern void CopyToFormatTextEnd(CopyToState cstate);
> > +extern void CopyFromFormatTextStart(CopyFromState cstate, TupleDesc tupDesc);
> > +extern bool CopyFromFormatTextNext(CopyFromState cstate, ExprContext *econtext,
> > +
> > Datum *values, bool *nulls);
> > +extern void CopyFromFormatTextErrorCallback(CopyFromState cstate);
> > +
> > +extern void CopyToFormatBinaryStart(CopyToState cstate, TupleDesc tupDesc);
> > +extern void CopyToFormatBinaryOneRow(CopyToState cstate, TupleTableSlot *slot);
> > +extern void CopyToFormatBinaryEnd(CopyToState cstate);
> > +extern void CopyFromFormatBinaryStart(CopyFromState cstate, TupleDesc tupDesc);
> > +extern bool CopyFromFormatBinaryNext(CopyFromState cstate,
> > ExprContext *econtext,
> > +
> > Datum *values, bool *nulls);
> > +extern void CopyFromFormatBinaryErrorCallback(CopyFromState cstate);
> > ```
> >
> > FYI - If you add files for {text|csv|binary}, these declarations can be removed.
> >
> > Best Regards,
> > Hayato Kuroda
> > FUJITSU LIMITED
> >
>
> Thanks for all the valuable suggestions.
>
> --
> Regards
> Junwang Zhao
>
>

Attachment Content-Type Size
Cloud-friendly COPY.pdf application/pdf 682.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2023-12-09 11:52:55 Re: Change GUC hashtable to use simplehash?
Previous Message Hannu Krosing 2023-12-09 11:32:22 Why are wal_keep_size, max_slot_wal_keep_size requiring server restart?