Re: Make COPY extendable in order to support Parquet and other formats

From: Aleksander Alekseev <aleksander(at)timescale(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Make COPY extendable in order to support Parquet and other formats
Date: 2022-06-21 09:56:21
Message-ID: CAJ7c6TPcsFScSneXHJShZAfatcYS-VqX+TtVU8TAmHwnVoTioQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Ashutosh,

> An extension just for COPY to/from parquet looks limited in
> functionality. Shouldn't this be viewed as an FDW or Table AM support
> for parquet or other formats? Of course the later is much larger in
> scope compared to the first one. But there may already be efforts
> underway
> https://www.postgresql.org/about/news/parquet-s3-fdw-01-was-newly-released-2179/

Many thanks for sharing your thoughts on this!

We are using parquet_fdw [2] but this is a read-only FDW.

What users typically need is to dump their data as fast as possible in
a given format and either to upload it to the cloud as historical data
or to transfer it to another system (Spark, etc). The data can be
accessed later if needed, as read only one.

Note that when accessing the historical data with parquet_fdw you
basically have a zero ingestion time.

Another possible use case is transferring data to PostgreSQL from
another source. Here the requirements are similar - the data should be
dumped as fast as possible from the source, transferred over the
network and imported as fast as possible.

In other words, personally I'm unaware of use cases when somebody
needs a complete read/write FDW or TableAM implementation for formats
like Parquet, ORC, etc. Also to my knowledge they are not particularly
optimized for this.

[2]: https://github.com/adjust/parquet_fdw

--
Best regards,
Aleksander Alekseev

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jakub Wartak 2022-06-21 10:20:14 RE: Use fadvise in wal replay
Previous Message Andrey Borodin 2022-06-21 09:48:55 Re: Use fadvise in wal replay