Re: pg_dump / copy bugs with "big lines" ?

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Alvaro Herrera" <alvherre(at)2ndquadrant(dot)com>
Cc: "Tomas Vondra" <tomas(dot)vondra(at)2ndquadrant(dot)com>,"Craig Ringer" <craig(at)2ndquadrant(dot)com>,"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>,"Robert Haas" <robertmhaas(at)gmail(dot)com>,"Jim Nasby" <Jim(dot)Nasby(at)bluetreble(dot)com>,"Ronan Dunklau" <ronan(dot)dunklau(at)dalibo(dot)com>,"pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump / copy bugs with "big lines" ?
Date: 2016-11-29 15:02:45
Message-ID: 9cfb98ae-e4a4-4654-bf1a-d37e7d27c075@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera wrote:

> But I realized that doing it this way is simple enough;
> and furthermore, in any platforms where int is 8 bytes (ILP64), this
> would automatically allow for 63-bit-sized stringinfos

On such platforms there's the next problem that we can't
send COPY rows through the wire protocol when they're larger
than 2GB.

Based on the tests with previous iterations of the patch that used
int64 for the StringInfo length, the COPY backend code does not
gracefully fail in that case.

What happened when trying (linux x86_64) with a 2GB-4GB row
is that the size seems to overflow and is sent as a 32-bit integer
with the MSB set, which is confusing for the client side (at least
libpq cannot deal with it).

If we consider what would happen with the latest patch on a platform
with sizeof(int)=8 and a \copy invocation like this:

\copy (select big,big,big,big,big,big,big,big,...... FROM
(select lpad('', 1024*1024*200) as big) s) TO /dev/null

if we put enough copies of "big" in the select-list to grow over 2GB,
and then over 4GB.

On a platform with sizeof(int)=4 this should normally fail over 2GB with
"Cannot enlarge string buffer containing $X bytes by $Y more bytes"

I don't have an ILP64 environment myself to test, but I suspect
it would malfunction instead of cleanly erroring out on such
platforms.

One advantage of hardcoding the StringInfo limit to 2GB independantly
of sizeof(int) is to basically avoid the problem.

Also, without this limit, we can "COPY FROM/TO file" really huge rows, 4GB
and beyond, like I tried successfully during the tests mentioned upthread
(again with len as int64 on x86_64).
So such COPYs would succeed or fail depending on whether they deal with
a file or a network connection.
Do we want this difference in behavior?
(keeping in mind that they will both fail anyway if any individual field
in the row is larger than 1GB)

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2016-11-29 15:13:57 Re: Proposal for changes to recovery.conf API
Previous Message Craig Ringer 2016-11-29 14:29:23 Re: Re: Use procsignal_sigusr1_handler and RecoveryConflictInterrupt() from walsender?