Re: pg_dump / copy bugs with "big lines" ?

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Ronan Dunklau <ronan(dot)dunklau(at)dalibo(dot)com>, 'pgsql-hackers' <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump / copy bugs with "big lines" ?
Date: 2015-03-30 23:45:41
Message-ID: 5519E025.3030704@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/30/15 5:46 AM, Ronan Dunklau wrote:
> Hello hackers,
>
> I've tried my luck on pgsql-bugs before, with no success, so I report these
> problem here.
>
> The documentation mentions the following limits for sizes:
>
> Maximum Field Size 1 GB
> Maximum Row Size 1.6 TB
>
> However, it seems like rows bigger than 1GB can't be COPYed out:
>
> ro=# create table test_text (c1 text, c2 text);
> CREATE TABLE
> ro=# insert into test_text (c1) VALUES (repeat('a', 536870912));
> INSERT 0 1
> ro=# update test_text set c2 = c1;
> UPDATE 1
>
> Then, trying to dump or copy that results in the following error:
>
> ro=# COPY test_text TO '/tmp/test';
> ERROR: out of memory
> DÉTAIL : Cannot enlarge string buffer containing 536870913 bytes by 536870912
> more bytes.
>
> In fact, the same thing happens when using a simple SELECT:
>
> ro=# select * from test_text ;
> ERROR: out of memory
> DÉTAIL : Cannot enlarge string buffer containing 536870922 bytes by 536870912
> more bytes.
>
> In the case of COPY, the server uses a StringInfo to output the row. The
> problem is, a StringInfo is capped to MAX_ALLOC_SIZE (1GB - 1), but a row
> should be able to hold much more than that.

Yeah, shoving a whole row into one StringInfo is ultimately going to
limit a row to 1G, which is a far cry from what the docs claim. There's
also going to be problems with FE/BE communications, because things like
pq_sendbyte all use StringInfo as a buffer too. So while Postgres can
store a 1.6TB row, you're going to find a bunch of stuff that doesn't
work past around 1GB.

> So, is this a bug ? Or is there a caveat I would have missed in the
> documentation ?

I suppose that really depends on your point of view. The real question
is whether we think it's worth fixing, or a good idea to change the
behavior of StringInfo.

StringInfo uses int's to store length, so it could possibly be changed,
but then you'd just error out due to MaxAllocSize.

Now perhaps those could both be relaxed, but certainly not to the extent
that you can shove an entire 1.6TB row into an output buffer.

The other issue is that there's a LOT of places in code that blindly
copy detoasted data around, so while we technically support 1GB toasted
values you're probably going to be quite unhappy with performance. I'm
actually surprised you haven't already seen this with 500MB objects.

So long story short, I'm not sure how worthwhile it would be to try and
fix this. We probably should improve the docs though.

Have you looked at using large objects for what you're doing? (Note that
those have their own set of challenges and limitations.)

> We also hit a second issue, this time related to bytea encoding.

There's probably several other places this type of thing could be a
problem. I'm thinking of conversions in particular.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Kirkwood 2015-03-31 00:14:45 Re: Streaming replication
Previous Message Tatsuo Ishii 2015-03-30 23:45:05 Streaming replication