Re: remaining sql/json patches

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, jian he <jian(dot)universality(at)gmail(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: remaining sql/json patches
Date: 2023-11-24 09:32:06
Message-ID: CA+HiwqHGOL8a5MjfOd_jbWMontKizqjoqCf5HtqCXhNjnpUktw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 23, 2023 at 4:38 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2023-11-21 12:52:35 +0900, Amit Langote wrote:
> > version gram.o text bytes %change gram.c bytes %change
> >
> > 9.6 534010 - 2108984 -
> > 10 582554 9.09 2258313 7.08
> > 11 584596 0.35 2313475 2.44
> > 12 590957 1.08 2341564 1.21
> > 13 590381 -0.09 2357327 0.67
> > 14 600707 1.74 2428841 3.03
> > 15 633180 5.40 2495364 2.73
> > 16 653464 3.20 2575269 3.20
> > 17-sqljson 672800 2.95 2709422 3.97
> >
> > So if we put SQL/JSON (including JSON_TABLE()) into 17, we end up with a gram.o 2.95% larger than v16, which granted is a somewhat larger bump, though also smaller than with some of recent releases.
>
> I think it's ok to increase the size if it's necessary increases - but I also
> think we've been a bit careless at times, and that that has made the parser
> slower. There's probably also some "infrastructure" work we could do combat
> some of the growth too.
>
> I know I triggered the use of the .c bytes and text size, but it'd probably
> more sensible to look at the size of the important tables generated by bison.
> I think the most relevant defines are:
>
> #define YYLAST 117115
> #define YYNTOKENS 521
> #define YYNNTS 707
> #define YYNRULES 3300
> #define YYNSTATES 6255
> #define YYMAXUTOK 758
>
>
> I think a lot of the reason we end up with such a big "state transition" space
> is that a single addition to e.g. col_name_keyword or unreserved_keyword
> increases the state space substantially, because it adds new transitions to so
> many places. We're in quadratic territory, I think. We might be able to do
> some lexer hackery to avoid that, but not sure.

One thing I noticed when looking at the raw parsing times across
versions is that they improved a bit around v12 and then some in v13:

9.0 0.000060 s
9.6 0.000061 s
10 0.000061 s
11 0.000063 s
12 0.000055 s
13 0.000054 s
15 0.000057 s
16 0.000059 s

I think they might be due to the following commits in v12 and v13 resp.:

commit c64d0cd5ce24a344798534f1bc5827a9199b7a6e
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: Wed Jan 9 19:47:38 2019 -0500
Use perfect hashing, instead of binary search, for keyword lookup.
...
Discussion: https://postgr.es/m/20190103163340.GA15803@britannica.bec.de

commit 7f380c59f800f7e0fb49f45a6ff7787256851a59
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: Mon Jan 13 15:04:31 2020 -0500
Reduce size of backend scanner's tables.
...
Discussion:
https://postgr.es/m/CACPNZCvaoa3EgVWm5yZhcSTX6RAtaLgniCPcBVOCwm8h3xpWkw@mail.gmail.com

I haven't read the whole discussions there to see if the target(s)
included the metrics you've mentioned though, either directly or
indirectly.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-11-24 09:45:10 Re: Synchronizing slots from primary to standby
Previous Message Drouvot, Bertrand 2023-11-24 09:16:41 Re: Synchronizing slots from primary to standby