Re: Make COPY format extendable: Extract COPY TO format implementations

From: Junwang Zhao <zhjwpku(at)gmail(dot)com>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: Sutou Kouhei <kou(at)clear-code(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2023-12-06 14:32:14
Message-ID: CAEG8a3+D3B7zKOyq4LRgV8_Fx6n7-rZ4hN+wzD7UgU22+8BQzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 6, 2023 at 8:32 PM Daniel Verite <daniel(at)manitou-mail(dot)org> wrote:
>
> Sutou Kouhei wrote:
>
> > * 2022-04: Apache Arrow [2]
> > * 2018-02: Apache Avro, Apache Parquet and Apache ORC [3]
> >
> > (FYI: I want to add support for Apache Arrow.)
> >
> > There were discussions how to add support for more formats. [3][4]
> > In these discussions, we got a consensus about making COPY
> > format extendable.
>
>
> These formats seem all column-oriented whereas COPY is row-oriented
> at the protocol level [1].
> With regard to the procotol, how would it work to support these formats?
>

They have kind of *RowGroup* concepts, a bunch of rows goes to a RowBatch
and the data of the same column goes together.

I think they should fit the COPY semantics and there are some FDW out there for
these modern formats, like [1]. If we support COPY to deal with the
format, it will
be easier to interact with them(without creating
server/usermapping/foreign table).

[1]: https://github.com/adjust/parquet_fdw

>
> [1] https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-COPY
>
>
> Best regards,
> --
> Daniel Vérité
> https://postgresql.verite.pro/
> Twitter: @DanielVerite
>
>

--
Regards
Junwang Zhao

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-12-06 15:06:17 Re: Is WAL_DEBUG related code still relevant today?
Previous Message Tomas Vondra 2023-12-06 14:18:21 Re: logical decoding and replication of sequences, take 2