Re: Re: COPY BINARY file format proposal

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: COPY BINARY file format proposal
Date: 2000-12-07 19:28:28
Message-ID: 13671.976217308@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
>> Just thinking that the only way an endianness flag inside the header
>> would be useful is if we pick a magic number that's a bytewise
>> palindrome.

> You could just read the 1st, 2nd, 3rd, etc bytes and require that they be
> 'P', 'G', 'C', 'P', 'Y' or some such. I *think* reading five bytes and
> doing a strcmp works...ie. don't rely on the integer value, use a string.

Oh. We could use a string instead of an integer, I suppose, although
I'm not sure I see the point for what's basically a binary format.

Given all that, here is a proposed spec for the header:

First 8 bytes: signature, ASCII "PGBCOPY\0" --- note that the null is a
required part of the signature. (This is to catch files that have been
munged by a non-8-bit-clean transfer.)

Next 4 bytes: integer layout field. This consists of the int32 constant
0x0A820D0A expressed in the source machine's endianness. (Again, value
chosen with malice aforethought, to catch files munged by things like
DOS/Unix newline conversion or high-bit-stripping.) Potentially, a
reader could engage in byte-flipping of subsequent fields if the wrong
byte order is detected here.

Next 4 bytes: version number, currently 1 (expressed in source machine's
endianness, as are all subsequent integer fields). A reader should
abort if it does not recognize the version number.

Next 4 bytes: length of remainder of header, not including self. In
the initial version this will be zero, and the first tuple follows
immediately. Future changes to the format might allow additional data
to be present in the header. A reader should silently ignore any header
extension data it does not know what to do with.

This allows for both backwards-compatible header additions (extend the
header without changing the version number) and non-backwards-compatible
changes (bump the version number).

Since we don't yet know what we might do about the issue of
floating-point format, I left that out of the spec. It can be added to
the header extension area when and if we figure out how to do it.

Likewise, addons such as column names are also punted until later.

Comments?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-12-07 19:42:08 Re: BUG WITH CREATE FUNCTION.......
Previous Message Tom Lane 2000-12-07 19:05:39 Switch pg_ctl's default about waiting?