Re: 7.4 COPY BINARY Format Change

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Lee Kindness <lkindness(at)csl(dot)co(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 7.4 COPY BINARY Format Change
Date: 2003-08-03 15:45:53
Message-ID: 26887.1059925553@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Lee Kindness <lkindness(at)csl(dot)co(dot)uk> writes:
>>> The real change that occurred here is that the individual data fields
>>> go through per-datatype send/receive routines, which in addition to
>>> implementing a mostly machine-independent binary format also provide
>>> defenses against bad input data.

> Well in that case the docs need attention. They describe the
> "envelope" surrounding the tuples, but no mention is made of the
> format they are in. It is reasonable to assume that this format was
> the native binary format, as in earlier releases.

Yeah, there should be some mention of that in the COPY ref page I guess
--- it's mentioned in the frontend protocol chapter, but not under COPY.
In my defense I'd point out that the contents of individual fields have
never been documented under COPY.

> What do I need to do to make this
> code work with 7.4? Is there any docs describing the "binary" format
> for each of the datatypes or do I need to reverse-engineer a dump file
> or look in the source?

ATM, I'd recommend looking in the sources to see what the datatype
send/receive routines do.

I have been thinking about documenting the binary formats during beta,
but am unsure where to put the info. We never documented the internal
formats before either, so there's no obvious place.

> Are the routines in libpq/pqformat.c intended
> to be used by client applications to read/write the binary COPY files?

They are not designed to be used outside the backend environment,
although possibly some enterprising person could adapt them. I am not
sure there's any value in it though. Copying the backend code helps
only if what you want to get out of the transmission is the same as the
backend's internal format, which for anything more complex than
int/float/text seems a bit dubious.

>>> We are not going back to the pre-7.4 format. Sorry.

> Well as pointed out in my earlier message nothing has changed which
> requires the format to change - there is no real reason it's now
> "PGCOPY" and the integer layout field has disappeared.

Given that the interpretation of the field contents has changed
drastically, I thought it better to make an obvious incompatible
change. We could perhaps have kept the skeleton the same, but to
what end? An app trying to read or write the file as if it were
pre-7.4 data would fail miserably anyway.

> I am still willing to make a patch which does this (to aid those
> writing COPY format files) and to fully support the reading of the old
> format tuples. However i'm not going to waste both our time if this
> patch is not going to be positively considered...

My vote will be to reject it because of the security problem.

> I can't think of much use of byte swapping when 99% of the
> use of COPY BINARY FROM is to improve performance over using
> INSERT. Both the reader and writer will be using the same binary
> integer/float/etc formats!

You must think that the universe consists exclusively of Intel hardware.
In my view, standardizing on a machine-independent binary format will
greatly *expand* the usefulness of COPY BINARY, since the files will not
be tied to a single architecture.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-08-03 15:55:35 Re: concat_ws
Previous Message Rod Taylor 2003-08-03 15:01:22 Re: SQL2003 GENERATED ... AS ... syntax

Browse pgsql-patches by date

  From Date Subject
Next Message Troels Arvin 2003-08-03 20:06:35 Re: AUTO_INCREMENT patch
Previous Message Lee Kindness 2003-08-03 14:35:18 Re: 7.4 COPY BINARY Format Change