inconsistency and inefficiency in setup_conversion()

From: John Naylor <jcnaylor(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: inconsistency and inefficiency in setup_conversion()
Date: 2018-04-28 15:51:03
Message-ID: CAJVSVGWtUqxpfAaxS88vEGvi+jKzWZb2EStu5io-UPc4p9rSJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Taking a close look at the result of setup_conversion(), wrong or at
least confusing comments are applied to the functions. Consider this
family of conversions:

select conproc, conname
from pg_conversion
where conproc = 'utf8_to_win'::regproc
order by oid;
conproc | conname
-------------+----------------------
utf8_to_win | utf8_to_windows_866
utf8_to_win | utf8_to_windows_874
utf8_to_win | utf8_to_windows_1250
utf8_to_win | utf8_to_windows_1251
utf8_to_win | utf8_to_windows_1252
utf8_to_win | utf8_to_windows_1253
utf8_to_win | utf8_to_windows_1254
utf8_to_win | utf8_to_windows_1255
utf8_to_win | utf8_to_windows_1256
utf8_to_win | utf8_to_windows_1257
utf8_to_win | utf8_to_windows_1258
(11 rows)

Then compare the comment on the function:

select proname, description
from pg_description d
join pg_proc p on d.objoid=p.oid
where classoid = 'pg_proc'::regclass
and description ~ 'for UTF8 to WIN';
proname | description
-------------+--------------------------------------------------
utf8_to_win | internal conversion function for UTF8 to WIN1258
(1 row)

Notice how the comment refers to the last encoding created. This is
because setup_conversion.sql invokes CREATE OR REPLACE FUNCTION
utf8_to_win [...] multiple times, each with different comments
specific to the encoding. It'd be messy at best to try to construct
the right comment using the current Makefile script. It also can't be
good for initdb performance to create 44 functions just to immediately
drop them. Speaking of, from this thread about initdb performance [1],
setup_conversion() consumed the biggest share of time. I propose to
get rid of the ad hoc $(CONVERSIONS) format and solve the comment
issue, while hopefully shaving a bit more time off of initdb. It seems
our options are the following:

Solution #1 - As alluded to in [1], turn the conversions into
pg_proc.dat and pg_conversion.dat entries. Teach genbki.pl to parse
pg_wchar.h to map conversion names to numbers.
Pros:
-likely easy to do
-allows for the removal of an install target in the Makefile as well
as ad hoc logic in MSVC
-uses a format that developers need to use anyway
Cons:
-immediately burns up 88 hard-coded OIDs and one for each time a
conversion proc is created
-would require editing data in two catalogs every time a conversion
proc is created

Solution #2 - Write a new script that would read all the .c files in
the various directories and output two files. These would be COPY'd
into temp tables during initdb, and then inserted into pg_proc,
pg_conversion, and pg_description using SQL.
Pros:
-eliminates all(?) manual catalog maintenance when adding new conversion procs
Cons:
-likely complex and difficult to debug
-further complicates initdb.c
-requires MSVC development

If we do anything, I'd much rather do #1, but that way is not entirely
without downsides compared to doing nothing. Any thoughts?

[1] https://www.postgresql.org/message-id/b549c8ad-f12e-aad1-9a59-b24cb3e55a17@proxel.se

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2018-04-28 16:00:32 Re: [GENERAL] huge RAM use in multi-command ALTER of table heirarchy
Previous Message Tom Lane 2018-04-28 15:46:53 Re: Fix some trivial issues of the document/comment