Re: allocation limit for encoding conversion

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: allocation limit for encoding conversion
Date: 2019-08-16 22:04:52
Message-ID: 20190816220452.lg57jowanbsa2gf7@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-08-16 17:31:49 -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> > Somebody ran into issues when generating large XML output (upwards of
> > 256 MB) and then sending via a connection with a different
> > client_encoding. This occurs because we pessimistically allocate 4x as
> > much memory as the string needs, and we run into the 1GB palloc
> > limitation. ISTM we can do better now by using huge allocations, as per
> > the preliminary attached patch (which probably needs an updated overflow
> > check rather than have it removed altogether); but at least it is able
> > to process this query, which it wasn't without the patch:
>
> > select query_to_xml(
> > 'select a, cash_words(a::text::money) from generate_series(0, 2000000) a',
> > true, false, '');
>
> I fear that allowing pg_do_encoding_conversion to return strings longer
> than 1GB is just going to create failure cases somewhere else.
>
> However, it's certainly true that 4x growth is a pretty unlikely worst
> case. Maybe we could do something like
>
> 1. If string is short (say up to a few megabytes), continue to do it
> like now. This avoids adding overhead for typical cases.
>
> 2. Otherwise, run some lobotomized form of encoding conversion that
> just computes the space required (as an int64, I guess) without saving
> the result anywhere.
>
> 3. If space required > 1GB, fail.
>
> 4. Otherwise, allocate just the space required, and convert.

It's probably too big a hammer for this specific case, but I think at
some point we ought to stop using fixed size allocations for this kind
of work. Instead we should use something roughly like our StringInfo,
except that when exceeding the current size limit, the overflowing data
is stored in a separate allocation. And only once we actually need the
data in a consecutive form, we allocate memory that's large enough to
store the all the separate allocations in their entirety.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-08-16 22:29:30 Re: default_table_access_method is not in sample config file
Previous Message Bruce Momjian 2019-08-16 22:04:39 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)