Re: pg_dump / copy bugs with "big lines" ?

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Tomas Vondra" <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump / copy bugs with "big lines" ?
Date: 2016-03-02 16:33:04
Message-ID: 7fd5631a-c895-4986-aaad-1ca0ec278585@mm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra wrote:

> My guess is this is a problem at the protocol level - the 'd' message is
> CopyData, and all the messages use int32 to define length. So if there's
> a 2GB row, it's likely to overflow.

Yes. Besides, the full message includes a negative length:

> postgres=# \copy big2 to /dev/null
> lost synchronization with server: got message type "d", length -1568669676

which happens to be the correct size if interpreted as an unsigned int32

-1568669676 = (int) (1300UL*1024*1024*2 + 3 + 3*4 + 1 + 4)

One interpretation would be that putting an unsigned length in
CopyData message is a protocol violation.

However it's not clear to me that Int32 in the doc necessarily designates
a signed integer.

Int32 is defined as:
Intn(i)

An n-bit integer in network byte order (most significant byte
first). If i is specified it is the exact value that will appear,
otherwise the value is variable. Eg. Int16, Int32(42).

There's a least one example when we use Int16 as unsigned:
the number of parameters in Bind (F) can be up to 65535.
This maximum is tested explicitly and refered to at several
places in fe-exec.

In some instances, Int32 is clearly signed, because -1 is accepted
to indicate NULLness, such as again in Bind (F) for the length of
the parameter value.

From this it seems to me that Intn is to be interpreted as
signed or unsigned on a case by case basis.

Back to CopyData (F & B), it's documented as:

Byte1('d')
Identifies the message as COPY data.

Int32
Length of message contents in bytes, including self.

Byten
Data that forms part of a COPY data stream. Messages sent from the
backend will always correspond to single data rows, but messages sent
by frontends might divide the data stream arbitrarily.

I don't see any hint that this length is signed, nor any reason of having
it signed.

I guess before the patch it didn't matter, for the B case at least,
because the backend never sent more than 1GB.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2016-03-02 16:46:12 Re: More stable query plans via more predictable column statistics
Previous Message Thom Brown 2016-03-02 16:17:17 Re: multivariate statistics v10