From: | Lukas Eder <lukas(dot)eder(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #19063: Heavily nesting trivial ROW projections produces out of memory error |
Date: | 2025-09-25 08:30:34 |
Message-ID: | CAB4ELO4Jei0oLb9QALL5jY8aSe5MwZQnNjMx6WS-Jw70gMQCxg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Thanks for the explanation. I keep forgetting, even if I've worked on
parsing code many times.
I understand backwards compatibility concerns, but this particular case
does seem to show an extreme waste, especially since I don't think it's
really necessary here. I don't see the ambiguity of nesting parentheses as
(((1))), for example. It appears that the escaping rules have been kept
simple in order to make parsing 1-2 edge cases simpler, at the cost of most
ordinary cases being way too verbose.
Sure, nesting 32 levels without even actual named composite types is not
realistic, as this was just a client side integration test for client side
logic. But nesting 5 levels, including named composite types is not unseen,
and that already incurs 2^4 = 16 unnecessary double quotes for each data
item on each row.
In my opinion, the rich system of user-defined data structures is
PostgreSQL's biggest strength, even compared to Oracle, the other popular
ORDBMS. If there's such a significant performance penalty to using them,
people simply might not.
Of course, there's the workaround of serialising JSON to clients instead of
the native text format...
On Wed, Sep 24, 2025 at 8:43 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> > I don't understand how such excessive memory consumption is produced by
> such
> > a "simple" query.
>
> Compare
>
> regression=# select row(row(1));
> row
> ---------
> ("(1)")
> (1 row)
>
>
> regression=# select row(row(row(1)));
> row
> ---------------
> ("(""(1)"")")
> (1 row)
>
> regression=# select row(row(row(row(1))));
> row
> -------------------------
> ("(""(""""(1)"""")"")")
> (1 row)
>
> Each layer of ROW() has to quote the string representing the next
> lower layer, so it doubles all the double-quotes per the quoting
> rules at
>
> https://www.postgresql.org/docs/current/rowtypes.html#ROWTYPES-IO-SYNTAX
>
> So the output length nearly doubles for each layer of ROW(), and yes
> it doesn't take that many layers to get to something excessive.
>
> Maybe we could redefine the quoting rules in a way that's friendlier
> to this sort of thing, but it would surely make them a lot more
> complicated, which would likely introduce enough client-side bugs
> to make the change unattractive. Not to mention the whole backwards
> compatibility question.
>
> On the whole I can't get excited about this, since I see no very
> good reason to use such a data structure.
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | Devrim Gündüz | 2025-09-25 10:54:26 | Re: BUG #19058: Empty repomd.xml.asc file |
Previous Message | Michael Paquier | 2025-09-24 23:14:46 | Re: TRAP: failed Assert("outerPlan != NULL") in postgres_fdw.c |