ECPG cleanup and fix for clang compile-time problem

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: ECPG cleanup and fix for clang compile-time problem
Date: 2024-04-19 02:18:34
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here is a patch series that aims to clean up ecpg's preprocessor
code a little and fix the compile time problems we're seeing with
late-model clang [1]. I guess whether it's a cleanup is in the eye of
the beholder, but it definitely succeeds at fixing compile time: for
me, the time needed to compile preproc.o with clang 16 drops from
104 seconds to less than 1 second. It might be a little faster at
processing input too, though that wasn't the primary goal.

The reason that clang is having a problem seems to be the large number
of virtually-duplicate semantic actions in the generated preproc.y.
So I looked for a way to allow most productions to use the default
semantic action rather than having to write anything. The core idea
of this patch is to stop returning <str> results from grammar
nonterminals and instead pass the strings back as Bison location data,
which we can do by redefining YYLTYPE as "char *". Since ecpg isn't
using Bison's location logic for error reports, and seems unlikely to
do so in future, this doesn't cost us anything. Then we can implement
a one-size-fits-most token concatenation rule in YYLLOC_DEFAULT, and
only the various handmade rules that don't want to just concatenate
their inputs need to do something different. (Within those handmade
rules, the main notational change needed is to write "@N" not "$N"
for the string value of the N'th input token, and "@@" not "$@"
for the output string value.) Aside from not giving clang
indigestion, this makes the compiled parser a little smaller since
there are fewer semantic actions that need code space.

As Andres remarked in the other thread, the script that
constructs preproc.y is undocumented and unreadable, so I spent
a good deal of time reverse-engineering and cleaning that up
before I went to work on the actual problem. Four of the six
patches in this series are in the way of cleanup and adding
documentation, with no significant behavioral changes.

The patch series comprises:

0001: pgindent the code in pgc.l and preproc.y's precursor files.
Yeah, this was my latent OCD rearing its head, but I hate looking
at or working on messy code. It did actually pay some dividends
later on, by making it easier to make bulk edits.

0002: improve the external documentation and error checking of This was basically to convince myself that I knew
what it was supposed to do before I started changing it.
The error checks did find some errors, too: in particular,
it turns out there are two unused entries in ecpg.addons.

(This implies that is completely worthless and should
be nuked: it adds build cycles and maintenance effort while failing
to reliably accomplish its one job of detecting dead rules, because
what it is testing is not the same thing that actually does.
I've not included that removal in this patch series, though.)

0003: clean up and simplify, and write some internal
documentation for it. The effort of understanding it exposed that
there was a pretty fair amount of dead or at least redundant code,
so I got rid of that. This patch changes the output preproc.y
file only to the extent of removing some blank lines that didn't
seem very useful to preserve.

0004: this is where something useful happens, specifically where
we change <str>-returning productions to return void and instead
pass back the desired output string as location data. In most
cases the productions now need no explicit semantic action at all,
allowing substantial simplification in

0005: more cleanup. I didn't want to add more memory-management
code to preproc/type.c, where mm_alloc and mm_strdup have lived
for no explicable reason. I pulled those and a couple of other
functions out to a new file util.c, so as to have a better home
for new utility code.

0006: the big problem with 0004 is that it can't use the trick
of freeing input substrings as soon as it's created the merged
string, as cat_str and friends have historically done. That's
because YYLLOC_DEFAULT runs before the rule's semantic action
if any, so that if the action does need to look at the input
strings, they'd already be freed. So 0004 is leaking memory
rather badly. Fix that by creating a notion of "local" memory
that will be reclaimed at end of statement, analogously to
short-lived memory contexts in the backend. All the string
concatenation work happens in short-lived storage and we don't
worry about getting rid of intermediate values immediately.
By making cat_str and friends work similarly, we can get rid
of quite a lot of explicit mm_strdup calls, although we do have
to add some at places where we're building long-lived data
structures. This should greatly reduce the malloc/free traffic
too, at the cost of eating somewhat more space intra-statement.

In my view 0006 is about the scariest part of this, as it's
hard to be sure that there are no use-after-free problems
wherein a pointer to a short-lived string survives past end
of statement. It gets through the ecpg regression tests
under valgrind successfully, but I don't have much faith
in the thoroughness of the code coverage of those tests.
(If our code coverage tools worked on bison/flex stuff,
maybe this'd be less scary ... but they don't.)

I'll park this in the July commitfest.

regards, tom lane


Attachment Content-Type Size
v1-0001-Clean-up-indentation-and-whitespace-inconsistenci.patch text/x-diff 176.3 KB text/x-diff 12.2 KB
v1-0003-Major-cleanup-simplification-and-documentation-of.patch text/x-diff 24.0 KB
v1-0004-Re-implement-ecpg-preprocessor-s-string-managemen.patch text/x-diff 109.3 KB
v1-0005-Move-some-functions-into-a-new-file-ecpg-preproc-.patch text/x-diff 6.4 KB
v1-0006-Improve-ecpg-preprocessor-s-memory-management.patch text/x-diff 88.3 KB


Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2024-04-19 02:21:09 RE: Disallow changing slot's failover option in transaction block
Previous Message Michael Paquier 2024-04-19 01:41:30 Re: ALTER TABLE SET ACCESS METHOD on partitioned tables