Re: pg_dump / copy bugs with "big lines" ?

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ronan Dunklau <ronan(dot)dunklau(at)dalibo(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump / copy bugs with "big lines" ?
Date: 2015-04-08 05:06:42
Message-ID: 5524B762.5060407@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/7/15 10:29 PM, Michael Paquier wrote:
> On Wed, Apr 8, 2015 at 11:53 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Apr 6, 2015 at 1:51 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
>>> In any case, I don't think it would be terribly difficult to allow a bit
>>> more than 1GB in a StringInfo. Might need to tweak palloc too; ISTR there's
>>> some 1GB limits there too.
>>
>> The point is, those limits are there on purpose. Changing things
>> arbitrarily wouldn't be hard, but doing it in a principled way is
>> likely to require some thought. For example, in the COPY OUT case,
>> presumably what's happening is that we palloc a chunk for each
>> individual datum, and then palloc a buffer for the whole row. Now, we
>> could let the whole-row buffer be bigger, but maybe it would be better
>> not to copy all of the (possibly very large) values for the individual
>> columns over into a row buffer before sending it. Some refactoring
>> that avoids the need for a potentially massive (1.6TB?) whole-row
>> buffer would be better than just deciding to allow it.
>
> I think that something to be aware of is that this is as well going to
> require some rethinking of the existing libpq functions that are here
> to fetch a row during COPY with PQgetCopyData, to make them able to
> fetch chunks of data from one row.

The discussion about upping the StringInfo limit was for cases where a
change in encoding blows up because it's now larger. My impression was
that these cases don't expand by a lot, so we wouldn't be significantly
expanding StringInfo.

I agree that buffering 1.6TB of data would be patently absurd. Handling
the case of COPYing a row that's >1GB clearly needs work than just
bumping up some size limits. That's why I was wondering whether this was
a real scenario or just hypothetical... I'd be surprised if someone
would be happy with the performance of 1GB tuples, let alone even larger
than that.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2015-04-08 05:09:14 Re: Re: File count restriction of directory limits number of relations inside a database.
Previous Message Michael Paquier 2015-04-08 04:59:46 Re: Replication identifiers, take 4