Re: raw output from copy

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: "Dickson S(dot) Guedes" <listas(at)guedesoft(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavel Golub <pavel(at)microolap(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: raw output from copy
Date: 2015-07-25 16:41:18
Message-ID: CAFj8pRAx0p3X9T=VB0vpnbG7byx+jv8GrQL6zvmX_My+dq4xnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2015-07-23 22:05 GMT+02:00 Dickson S. Guedes <listas(at)guedesoft(dot)net>:

> 2015-07-07 3:32 GMT-03:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> >
> > Hi
> >
> > previous patch was broken, and buggy
> >
> > Here is new version with fixed upload and more tests
> >
> > The interesting is so I should not to modify interface or client - so it
> should to work with any current driver with protocol support >= 3.
>

Hi

>
> Hi Pavel,
>
> Here are some thoughts:
>
> 1) from docs: "only row data in network byte order are exported or
> imported."
>
> Should it be "only raw data"?
>

I don't understand well - it use a PostgreSQL buildin "send" functions -
and result of these functions is defined as "in network byte order"

>
> 2) from docs "Because this format doesn't support any delimiter, only
> one value can be exported or imported. NULL values are not allowed."
>
> That "only one value can be exported or imported" is a little sad for
> someone with a table with more than one column that accepts bytea. The
> implemented feature doesn't covers the use-case where a table 'image'
> has columns: id integer, image bytea, thumbnail bytea, and I want to
> import binary data in that. We could put here the cases where we have
> NOT NULL columns. Since these are expected and the error messages
> complain about that couldn't them be covered in docs more explicitly?
>

This mode should not to replace current COPY binary mode. RAW binary output
for multiple fields is terrible complex task - you can use a fix length,
you can use some special separator etc. I remember a terrible complex
bulkload on Oracle or MSSQL - and I would to design it differently. I
prefer to have a COPY statement simple as possible - If you need
import/export all fields in record - then you can:

1. you can use a new LO api (for import) - load binary files as LO, INSERT
and drop used LO
2. call more COPY statements, and join exported files with operation system
tools (for export),
3. you can write specialized application that will support a COPY API and
export, import data in your preferred format.

The same complexity is with input, and I would not to write generic binary
files parser.

>
> 3) from code: "bool row_processed; /* true, when first row was processed
> */"
>

in this mode is only one row - so first_row_processed sounds little bit
strange.

>
> Maybe rename the variable to something like `first_row_processed` and
> rip off the comment?
>
> 4) from code:
>
> if (cstate->raw)
> format = 2;
> else if (cstate->binary)
> format = 1;
> else
> format = 0;
>
> Maybe create a constant for code readability?
>

good idea

>
>
> If by one side this feature does not covers a more generalized case,
> by other is a nice start, IMHO.
>

It is exactly what I don't would - the complexity of usage can go up to sky
with generic binary format file processing.

Regards

Pavel

>
> --
> Dickson S. Guedes
> mail/xmpp: guedes(at)guedesoft(dot)net - skype: guediz
> http://github.com/guedes - http://guedesoft.net
> http://www.postgresql.org.br
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ildus Kurbangaliev 2015-07-25 17:00:38 Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Previous Message Pavel Stehule 2015-07-25 16:08:40 Re: pg_dump quietly ignore missing tables - is it bug?