Skip site navigation (1) Skip section navigation (2)

Re: pg_dump / copy bugs with "big lines" ?

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>,Ronan Dunklau <ronan(dot)dunklau(at)dalibo(dot)com>,pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump / copy bugs with "big lines" ?
Date: 2016-02-29 18:30:23
Message-ID: 20160229183023.GA286012@alvherre.pgsql (view raw, whole thread or download thread mbox)
Thread:
Lists: pgsql-hackers
A customer of ours was hit by this recently and I'd like to get it fixed
for good.

Robert Haas wrote:
> On Mon, Apr 6, 2015 at 1:51 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> > In any case, I don't think it would be terribly difficult to allow a bit
> > more than 1GB in a StringInfo. Might need to tweak palloc too; ISTR there's
> > some 1GB limits there too.
> 
> The point is, those limits are there on purpose.  Changing things
> arbitrarily wouldn't be hard, but doing it in a principled way is
> likely to require some thought.  For example, in the COPY OUT case,
> presumably what's happening is that we palloc a chunk for each
> individual datum, and then palloc a buffer for the whole row.

Right.  There's a serious problem here already, and users are being
bitten by it in existing releases.  I think we should provide a
non-invasive fix for back-branches which we could apply as a bug fix.
Here's a proposed patch for the localized fix; it just adds a flag to
StringInfo allowing the string to grow beyond the 1GB limit, but only
for strings which are specifically marked.  That way we avoid having to
audit all the StringInfo-using code.  In this patch, only the COPY path
is allowed to grow beyond 1GB, which is enough to close the current
problem -- or at least my test case for it.

If others can try this patch to ensure it enables pg_dump to work on
their databases, it would be great.

(In this patch there's a known buglet which is that the UINT64_FORMAT
patched in the error message doesn't allow for translatability.)

Like Robert, I don't like this approach for the long term, however.  I
think it would be saner to have CopySendData not keep a single gigantic
string in memory; it would be better to get the bytes out to the client
sooner than end-of-row.  To avoid causing a performance hit, we would
only flush when the size of the output buffer is about to reach the
allocation limit (MaxAllocSize); so for regular-sized tuples, we would
only do it at end-of-row, keeping the current behavior.  I don't have a
patch for this yet; I'm going to try that next.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment: huge-stringinfo.patch
Description: text/x-diff (4.7 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Petr JelinekDate: 2016-02-29 18:37:09
Subject: Re: [PATH] Jsonb, insert a new value into an array at arbitrary position
Previous:From: Tom LaneDate: 2016-02-29 18:08:42
Subject: Re: pgsql: Add isolationtester spec for old heapam.c bug

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group