Binary Cursors, and the COPY command

From: pgsql(at)mohawksoft(dot)com
To: pgsql-hackers(at)postgresql(dot)org
Subject: Binary Cursors, and the COPY command
Date: 2004-07-26 17:41:47
Message-ID: 10519.64.119.142.34.1090863707.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-jdbc

OK, I wrote a utility for 7.3 that takes the output of a select command in
a Binary cursor and creates a binary "COPY" file.

The premise of the utility is to take the results of two or more selects
from external databases and create a single unified table.

Here are the issues:

In 7.3, COPY BINARY was machine specific and so was the output of a binary
cursor. Everything just worked fine.

In 7.4, COPY BINARY uses "network byte order," i.e. native data types are
altered to big endian if nessisary. The documentation for binary cursors
does not specify whether or not the "binary" data is native or "network
byte order."

I have a few issues with "COPY BINARY" using "network byte order," first,
it is pointless. The problem it intends to solve, i.e. transfering across
different machine types is already answered using the tried and true ascii
method.

Second, it actually makes the COPY functionality less usable. You can not
create the data outside of the database because all the data type
definitions and manipulation functions are inside the database. (Unless
you only use simple data types, of course.)

Third, if a binary cursor does not encode the binary data as "network byte
order" a binary copy can ONLY communicate between two postgreSQL databases
because the information required to go from native ordering to network
ordering is only in the database.

Lastly, the vast majority of machines in use today are intel. Meaning that
they are small endian. Except in a very rare circumstance, two machines
that would normally be able to communicate in native byte order, will
ALWAYS have to convert data.

The only use case network byte order fixes is a BINARY COPY between
different machine types, but in doing that, it forces anyone trying to add
value to postgresql or create a utility that uses COPY to reimplement all
the data type handlers outside of the database, even if they never need to
interpret or inspect the values, because they have to do this to put them
in network byte order.

I would say that the history of the word "BINARY" would tend more to
indicate incompatible machine specific data.

I would submit that the 7.4 format of data, i.e. one data size int32
instead of an int16 followed by the optional int32 is cleaner, but I would
remove the "network byte order" and put the byte order int32 back in the
header for 7.5

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marc G. Fournier 2004-07-26 18:10:13 Re: CVS web interface error
Previous Message Simon Riggs 2004-07-26 16:40:54 Re: CVS web interface error

Browse pgsql-jdbc by date

  From Date Subject
Next Message Tom Lane 2004-07-26 17:41:51 Re: Problem w/ IDENT authentication
Previous Message Kris Jurka 2004-07-26 17:35:17 Re: SSL Connection Problems