Re: 7.4 COPY BINARY Format Change

From: Lee Kindness <lkindness(at)csl(dot)co(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Lee Kindness <lkindness(at)csl(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 7.4 COPY BINARY Format Change
Date: 2003-08-03 14:35:18
Message-ID: 16173.7590.451491.148083@kelvin.csl.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Tom,

Tom Lane writes:
> Lee Kindness <lkindness(at)csl(dot)co(dot)uk> writes:
> > I've attached a patch which lets COPY read in the 7.1 format. However
> > i'm not convinced this is the right way to go - I think the format
> > which is output by 7.4 should be identical to the 7.1 format.
>
> You are greatly underestimating the changes that occurred in COPY BINARY.
> If the format difference had been as minor as you think, I would not
> have gratuitously broken compatibility.
>
> The real change that occurred here is that the individual data fields
> go through per-datatype send/receive routines, which in addition to
> implementing a mostly machine-independent binary format also provide
> defenses against bad input data.
>
> To continue to read the old COPY BINARY format, we'd have to bypass
> those routines and allow direct read of the internal data formats.
> This was a security risk before and would be a much bigger one now,
> seeing that we allow COPY BINARY FROM STDIN to unprivileged users. It
> is trivial to crash the backend by feeding it bad internal-format
> data.

Well in that case the docs need attention. They describe the
"envelope" surrounding the tuples, but no mention is made of the
format they are in. It is reasonable to assume that this format was
the native binary format, as in earlier releases.

I've got applications which create binary "bulkload" files which are
loaded into the database using COPY FROM. Currently they write the
data out using simple fwrite calls. What do I need to do to make this
code work with 7.4? Is there any docs describing the "binary" format
for each of the datatypes or do I need to reverse-engineer a dump file
or look in the source? Are the routines in libpq/pqformat.c intended
to be used by client applications to read/write the binary COPY files?
If so they also need documented in the libpq docs and that
documentation linked to from the COPY docs.

> (I don't believe that the patch works anyway, given that you aren't doing
> anything to disable use of the per-datatype receive routine. It might
> work as-is for text fields, and for integers on bigendian machines, but
> not for much else.)

Yeah, I didn't spend a lot of effort in that respect - after all I
said myself I didn't see the patch being accepted...

> We are not going back to the pre-7.4 format. Sorry.

Well as pointed out in my earlier message nothing has changed which
requires the format to change - there is no real reason it's now
"PGCOPY" and the integer layout field has disappeared. The change for
the byte swapping should have been indicated by an entry in the flags
field.

I am still willing to make a patch which does this (to aid those
writing COPY format files) and to fully support the reading of the old
format tuples. However i'm not going to waste both our time if this
patch is not going to be positively considered...

I think it's worthwhile reiterating that this change will be a real
pain for PostgreSQL users when migrating to 7.4. To be honest i'd
probably stick with 7.3 until the subsequent major release. Have a
think what benefit this incompatibility gives users of COPY
BINARY... I can't think of much use of byte swapping when 99% of the
use of COPY BINARY FROM is to improve performance over using
INSERT. Both the reader and writer will be using the same binary
integer/float/etc formats!

So, will I look at implementing these changes? Or not?

L.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rod Taylor 2003-08-03 15:01:22 Re: SQL2003 GENERATED ... AS ... syntax
Previous Message Dag-Erling =?iso-8859-1?q?Sm=F8rgrav?= 2003-08-03 14:31:49 Re: SQL2003 GENERATED ... AS ... syntax

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2003-08-03 15:45:53 Re: 7.4 COPY BINARY Format Change
Previous Message Andrew Dunstan 2003-08-03 02:09:04 Re: AUTO_INCREMENT patch