Re: pg_dump / copy bugs with "big lines" ?

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Alvaro Herrera" <alvherre(at)2ndquadrant(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>,"Robert Haas" <robertmhaas(at)gmail(dot)com>,"Jim Nasby" <Jim(dot)Nasby(at)bluetreble(dot)com>,"Ronan Dunklau" <ronan(dot)dunklau(at)dalibo(dot)com>,"pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump / copy bugs with "big lines" ?
Date: 2016-03-23 17:14:16
Message-ID: d3fe524a-1c78-4cb3-9814-849cd4f43fe6@mm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera wrote:

> > tuple = (HeapTuple) palloc0(HEAPTUPLESIZE + len);
> >
> > which fails because (HEAPTUPLESIZE + len) is again considered
> > an invalid size, the size being 1468006476 in my test.
>
> Um, it seems reasonable to make this one be a huge-zero-alloc:
>
> MemoryContextAllocExtended(CurrentMemoryContext,
> HEAPTUPLESIZE + len,
> MCXT_ALLOC_HUGE | MCXT_ALLOC_ZERO)

Good, this allows the tests to go to completion! The tests in question
are dump/reload of a row with several fields totalling 1.4GB (deflated),
with COPY TO/FROM file and psql's \copy in both directions, as well as
pg_dump followed by pg_restore|psql.

The modified patch is attached.

It provides a useful mitigation to dump/reload databases having
rows in the 1GB-2GB range, but it works under these limitations:

- no single field has a text representation exceeding 1GB.
- no row as text exceeds 2GB (\copy from fails beyond that. AFAICS we
could push this to 4GB with limited changes to libpq, by
interpreting the Int32 field in the CopyData message as unsigned).

It's also possible to go beyond 4GB per row with this patch, but
when not using the protocol. I've managed to get a 5.6GB single-row
file with COPY TO file. That doesn't help with pg_dump, but that might
be useful in other situations.

In StringInfo, I've changed int64 to Size, because on 32 bits platforms
the downcast from int64 to Size is problematic, and as the rest of the
allocation routines seems to favor Size, it seems more consistent
anyway.

I couldn't test on 32 bits though, as I seem to never have enough
free contiguous memory available on a 32 bits VM to handle
that kind of data.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Attachment Content-Type Size
huge-stringinfo-v2.diff text/x-patch 6.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-03-23 17:20:04 Re: Breakage with VACUUM ANALYSE + partitions
Previous Message Robert Haas 2016-03-23 17:11:23 Re: [PATCH] fix DROP OPERATOR to reset links to itself on commutator and negator