Re: pg_dump / copy bugs with "big lines" ?

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Ronan Dunklau <ronan(dot)dunklau(at)dalibo(dot)com>
Cc: 'pgsql-hackers' <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump / copy bugs with "big lines" ?
Date: 2015-04-06 17:51:56
Message-ID: 5522C7BC.3070705@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/31/15 3:46 AM, Ronan Dunklau wrote:
>> >StringInfo uses int's to store length, so it could possibly be changed,
>> >but then you'd just error out due to MaxAllocSize.
>> >
>> >Now perhaps those could both be relaxed, but certainly not to the extent
>> >that you can shove an entire 1.6TB row into an output buffer.
> Another way to look at it would be to work in small chunks. For the first test
> case (rows bigger than 1GB), maybe the copy command could be rewritten to work
> in chunks, flushing the output more often if needed.

Possibly; I'm not sure how well the FE/BE protocol or code would
actually support that.

>> >The other issue is that there's a LOT of places in code that blindly
>> >copy detoasted data around, so while we technically support 1GB toasted
>> >values you're probably going to be quite unhappy with performance. I'm
>> >actually surprised you haven't already seen this with 500MB objects.
>> >
>> >So long story short, I'm not sure how worthwhile it would be to try and
>> >fix this. We probably should improve the docs though.
>> >
> I think that having data that can't be output by pg_dump is quite surprising,
> and if this is not fixable, I agree that it should clearly be documented.
>
>> >Have you looked at using large objects for what you're doing? (Note that
>> >those have their own set of challenges and limitations.)
> Yes I do. This particular customer of ours did not mind the performance
> penalty of using bytea objects as long as it was convenient to use.

What do they do when they hit 1GB? Presumably if they're this close to
the limit they're already hitting 1GB, no? Or is this mostly hypothetical?

>> >
>>> > >We also hit a second issue, this time related to bytea encoding.
>> >
>> >There's probably several other places this type of thing could be a
>> >problem. I'm thinking of conversions in particular.
> Yes, thats what the two other test cases I mentioned are about: any conversion
> leadng to a size greater than 1GB results in an error, even implicit
> conversions like doubling antislashes in the output.

I think the big issue with encoding is going to be the risk of changing
encoding and ending up with something too large to fit back into
storage. They might need to consider using something like bytea(990MB).

In any case, I don't think it would be terribly difficult to allow a bit
more than 1GB in a StringInfo. Might need to tweak palloc too; ISTR
there's some 1GB limits there too.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christian Almeida 2015-04-06 17:54:52 Re: BUG #12989: pg_size_pretty with negative values
Previous Message David G. Johnston 2015-04-06 17:45:41 Re: BUG #12989: pg_size_pretty with negative values