Re: Make COPY extendable in order to support Parquet and other formats

From: Aleksander Alekseev <aleksander(at)timescale(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Make COPY extendable in order to support Parquet and other formats
Date: 2022-06-22 11:51:49
Message-ID: CAJ7c6TNFD84KK62xrGP-PDwPM7OESM8=TTv8TjsZpbOuNMnwGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Ashutosh,

> IIUC, you want extensibility in FORMAT argument to COPY command
> https://www.postgresql.org/docs/current/sql-copy.html. Where the
> format is pluggable. That seems useful.
> Another option is to dump the data in csv format but use external
> utility to convert csv to parquet or whatever other format is. I
> understand that that's not going to be as efficient as dumping
> directly in the desired format.

Exactly. However, to clarify, I suspect this may be a bit more
involved than simply extending the FORMAT arguments.

This change per se will not be extremely useful. Currently nothing
prevents an extension author to iterate over a table using
heap_open(), heap_getnext(), etc API and dump its content in any
format. The user will have to write "dump_table(foo, filename)"
instead of "COPY ..." but that's not a big deal.

The problem is that every new extension has to re-invent things like
figuring out the schema, the validation of the data, etc. If we could
do this in the core so that an extension author has to implement only
the minimal format-dependent list of callbacks that would be really
great. In order to make the interface practical though one will have
to implement a practical extension as well, for instance, a Parquet
one.

This being said, if it turns out that for some reason this is not
realistic to deliver, ending up with simply extending this part of the
syntax a bit should be fine too.

--
Best regards,
Aleksander Alekseev

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2022-06-22 13:25:22 SYSTEM_USER reserved word implementation
Previous Message houzj.fnst@fujitsu.com 2022-06-22 11:35:07 RE: Replica Identity check of partition table on subscriber