PGDump / copy bugs with "big lines" ?

From: Ronan Dunklau <ronan(dot)dunklau(at)dalibo(dot)com>
To: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: PGDump / copy bugs with "big lines" ?
Date: 2015-03-12 09:39:01
Message-ID: 3859545.fB9yxixrjG@ronan.dunklau.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello.

The documentation mentions the following limits for sizes:

Maximum Field Size 1 GB
Maximum Row Size 1.6 TB

However, it seems like rows bigger than 1GB can't be COPYed out:

ro=# create table test_text (c1 text, c2 text);
CREATE TABLE
ro=# insert into test_text (c1) VALUES (repeat('a', 536870912));
INSERT 0 1
ro=# update test_text set c2 = c1;
UPDATE 1

Then, trying to dump or copy that results in the following error:

ro=# COPY test_text TO '/tmp/test';
ERROR: out of memory
DÉTAIL : Cannot enlarge string buffer containing 536870913 bytes by 536870912
more bytes.

In fact, the same thing happens when using a dumb SELECT:

ro=# select * from test_text ;
ERROR: out of memory
DÉTAIL : Cannot enlarge string buffer containing 536870922 bytes by 536870912
more bytes.

In the case of COPY, the server uses a StringInfo to output the row. The
problem is, a StringInfo is capped to MAX_ALLOC_SIZE (1GB - 1), but a row
should be able to hold much more than that.

So, is this a bug ? Or is there a caveat I would have missed in the
documentation ?

We also hit a second issue, this time related to bytea encoding.

This test case is a bit more complicated, since I had to use an external
(client) program to insert my data. It involves inserting a string that fit
into 1GB when encoded in escape format, but is larger than that in hex, and
another string which fits in 1GB using the hex format, but is larger than that
in escape:

from psycopg2 import connect
from io import BytesIO

conn = connect(dbname="ro")
cur = conn.cursor()
fullcontent = BytesIO()

# Write a binary string that weight less
# than 1 GB when escape encoded, but more than
# that if hex encoded
for i in range(200):
fullcontent.write(b"aaa" * 1000000)
fullcontent.seek(0)
cur.copy_from(fullcontent, "test_bytea")

fullcontent.seek(0)
fullcontent.truncate()

# Write another binary string that weight
# less than 1GB when hex encoded, but more than
# that if escape encoded
cur.execute("SET bytea_output = 'hex'")
fullcontent.write(b"\\\\x")
for i in range(300):
fullcontent.write(b"00" * 1000000)
fullcontent.seek(0)
cur.copy_from(fullcontent, "test_bytea")

cur.execute("COMMIT;")
cur.close()

I couldn't find an invocation of pg_dump which would allow me to dump both
lines:

ro(at)ronan_laptop /tmp % PGOPTIONS="-c bytea_output=escape" pg_dump -Fc >
/dev/null
pg_dump: Dumping the contents of table "test_bytea" failed: PQgetResult()
failed.
pg_dump: Error message from server: ERROR: invalid memory alloc request size
1200000001
pg_dump: The command was: COPY public.test_bytea (c1) TO stdout;
ro(at)ronan_laptop /tmp % PGOPTIONS="-c bytea_output=hex" pg_dump -Fc >
/dev/null
pg_dump: Dumping the contents of table "test_bytea" failed: PQgetResult()
failed.
pg_dump: Error message from server: ERROR: invalid memory alloc request size
1200000003
pg_dump: The command was: COPY public.test_bytea (c1) TO stdout;

Using a COPY with binary format works:

ro=# COPY test_bytea TO '/tmp/test' WITH BINARY;

I'm really surprised by those results: did I do anything wrong ? If it is
indeed a bug, I'm surprised it wasn't discovered before.

Thank you.

--
Ronan Dunklau
http://dalibo.com - http://dalibo.org

Browse pgsql-bugs by date

  From Date Subject
Next Message agroszer 2015-03-12 10:43:09 BUG #12856: No space left on device, but there is space!
Previous Message Asif Naeem 2015-03-12 09:33:08 Re: pg_upgrade failure on Windows Server