Broken error detection in genbki.pl

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Subject: Broken error detection in genbki.pl
Date: 2024-03-20 16:44:09
Message-ID: 19238.1710953049@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Wheeler complained over in [1] that genbki.pl fails to produce a
useful error message if there's anything wrong in a catalog data file.
He's right about that, but the band-aid patch he proposed doesn't
improve the situation much. The actual problem is that key error
messages in genbki.pl expect $bki_values->{line_number} to be set,
which it is not because we're invoking Catalog::ParseData with
$preserve_formatting = 0, and that runs a fast path that doesn't do
line-by-line processing and hence doesn't/can't fill that field.

I'm quite sure that those error cases worked as-intended when first
coded. I did not check the git history, but I suppose that somebody
added the non-preserve_formatting fast path later without any
consideration for the damage it was doing to error reporting ability.
I'm of the opinion that this change was penny-wise and pound-foolish.
On my machine, the time to re-make the bki files with the fast path
enabled is 0.597s, and with it disabled (measured with the attached
quick-hack patch) it's 0.612s. Is that savings worth future hackers
having to guess what they broke and where in a large .dat file?

As you can see from the quick-hack patch, it's kind of a mess to use
the $preserve_formatting = 1 case, because there are a lot of loops
that have to be taught to ignore comment lines, which we don't really
care about except in reformat_dat_file.pl. What I suggest doing, but
have not yet coded up, is to nuke the fast path in Catalog::ParseData
and reinterpret the $preserve_formatting argument as controlling
whether comments and whitespace are entered in the output data
structure, but not whether we parse it line-by-line. That should fix
this problem with zero change in the callers, and also buy back a
little bit of the cost compared to this quick hack.

Thoughts?

regards, tom lane

[1] https://www.postgresql.org/message-id/60EF4E11-BC1C-4034-B37B-695662D28AD2%40justatheory.com

Attachment Content-Type Size
fix-genbki-error-detection-hack.patch text/x-diff 5.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2024-03-20 17:09:33 Re: WIP Incremental JSON Parser
Previous Message Robert Haas 2024-03-20 16:43:08 Re: documentation structure