Re: Improving on MAX_CONVERSION_GROWTH

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Improving on MAX_CONVERSION_GROWTH
Date: 2019-09-27 19:25:02
Message-ID: 20159.1569612302@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Sep 27, 2019 at 11:40 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>> Note that one of the additional reasons for the 1GB limit is that it
>> protects against int overflows. I'm somewhat unconvinced that that's a
>> sensible approach, but ...

> It's not crazy. People using 'int' rather casually just as they use
> 'palloc' rather casually, without necessarily thinking about what
> could go wrong at the edges. I don't have any beef with that as a
> general strategy; I just think we should be trying to do better in the
> cases where it negatively affects the user experience.

A small problem with doing anything very interesting here is that the
int-is-enough-for-a-string-length approach is baked into the wire
protocol (read the DataRow message format spec and weep).

We could probably bend the COPY protocol enough to support multi-gig row
values --- dropping the rule that the backend doesn't split rows across
CopyData messages wouldn't break too many clients, hopefully. That would
at least dodge some problems in dump/restore scenarios.

In the meantime, I still think we should commit what I proposed in the
other thread (<974(dot)1569356381(at)sss(dot)pgh(dot)pa(dot)us>), or something close to it.
Andres' proposal would perhaps be an improvement on that, but I don't
think it'll be ready anytime soon; and for sure we wouldn't risk
back-patching it, while I think we could back-patch what I suggested.
In any case, that patch is small enough that dropping it would be no big
loss if a better solution comes along.

Also, as far as the immediate subject of this thread is concerned,
I'm inclined to get rid of MAX_CONVERSION_GROWTH in favor of using
the target encoding's max char length. The one use (in printtup.c)
where we don't know the target encoding could use MAX_MULTIBYTE_CHAR_LEN
instead. Being smarter than that could help in some cases (mostly,
conversion of ISO encodings to UTF8), but it's not that big a win.
(I did some checks and found that some ISO encodings could provide a
max growth of 2x, but many are max 3x, so 4x isn't that far out of
line.) If Andres' ideas don't pan out we could come back and work
harder on this, but for now something simple and back-patchable
seems like a useful stopgap improvement.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message legrand legrand 2019-09-27 19:26:31 Re: Hooks for session start and end, take two
Previous Message Alvaro Herrera 2019-09-27 19:17:36 Re: Attempt to consolidate reading of XLOG page