| From: | Bryan Green <dbryan(dot)green(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | [PATCH] Fix severe performance regression with gettext 0.20+ on Windows |
| Date: | 2025-12-10 00:45:52 |
| Message-ID: | f6a3a152-b6d8-4731-a506-25a64e2958de@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello hackers,
I've been investigating a performance issue on Windows with recent
gettext versions (0.20.1 and later) that causes exception-heavy
workloads to run significantly slower than with gettext 0.19.8.
Starting with gettext 0.20.1, the library changed its Windows locale
handling in a way that conflicts with how PostgreSQL sets LC_MESSAGES.
The performance regression manifests when raising many exceptions:
- gettext 0.19.8: ~32 seconds for 1M exceptions
- gettext 0.20.1+: ~180 seconds for 1M exceptions
- gettext 0.2x.y+: ~39 seconds for 1M exceptions
The root cause is a combination of three issues:
1. Locale format mismatch
gettext 0.20.1+ introduced a get_lcid() function that expects Windows
locale format ("English_United States.1252") rather than POSIX format
("en_US"). This function enumerates all Windows locales (~259) until a
match is found, then uses the resulting LCID to determine the catalog path.
PostgreSQL, however, has always used IsoLocaleName() to convert
Windows locales to POSIX format before setting LC_MESSAGES. This means
we're passing "en_US" to a function expecting "English_United States.1252".
The enumeration doesn't find "en_US" among Windows locale names,
returns 0, and gettext falls back to its internal locale resolution
(which still works correctly - translations are not broken, just slow).
2. Missing cache on failure
The get_lcid() function has a cache, but it only updates the cache
when found_lcid > 0 (successful lookup). Failed lookups don't update the
cache, causing the 259-locale enumeration to repeat on every gettext() call.
This is the actual performance bug in gettext - even if we passed a
valid Windows locale format, setting lc_messages to 'C' or 'POSIX'
(common in scripts and automation) would trigger the same issue since
these aren't Windows locale names. Please see the bug I opened with the
gettext project [1].
3. Empty string bug in early 0.2x.y
gettext 0.20.1 introduced a setlocale_null() wrapper that returns ""
instead of NULL when setlocale() fails. This causes get_lcid("") to be
called, triggering the enumeration bug even when LC_MESSAGES is unset.
The attached patch takes a pragmatic approach: for gettext 0.20.1+, we
avoid triggering the bug by using Windows locale format instead of
calling IsoLocaleName(). This works because gettext 0.20.1+ internally
converts the Windows format back to POSIX for catalog lookups, whereas
0.19.8 and earlier need POSIX format directly.
The patch uses LIBINTL_VERSION to detect the gettext version at compile
time and adjusts behavior accordingly. When locale is NULL, empty, or
set to 'C'/'POSIX', we fall back to using the LC_CTYPE value (which is
already in Windows format and always set).
For gettext 0.19.8 and earlier, the existing IsoLocaleName() path is
retained to maintain compatibility.
I don't have automated tests for this since we'd need to test against
multiple versions of a third-party library. I'm open to suggestions if
folks think we should add something to the buildfarm or CI.
Manual testing can be done with this test case:
-- Create test table
CREATE TABLE sampletest (
a VARCHAR,
b VARCHAR
);
-- Insert 1 million rows with random data
INSERT INTO sampletest (a, b)
SELECT
substr(md5(random()::text), 0, 15),
(100000000 * random())::integer::varchar
FROM generate_series(1, 1000000);
-- Create function that converts string to float with exception handling
CREATE OR REPLACE FUNCTION toFloat(str VARCHAR, val REAL)
RETURNS REAL AS $$
BEGIN
RETURN CASE
WHEN str IS NULL THEN val
ELSE str::REAL
END;
EXCEPTION
WHEN OTHERS THEN
RETURN val;
END;
$$ LANGUAGE plpgsql
COST 1
IMMUTABLE;
-- Test query to trigger 1M exceptions
-- (all conversions will fail since we inserted random MD5 strings)
\timing on
SELECT MAX(toFloat(a, NULL)) FROM sampletest;
The ~8 second difference is due to the initial enumeration and other
coding changes that were made by gettext. Keep in mind that for 1M
exceptions we are probably calling gettext 2-3 million times.
--
Bryan Green
EDB: https://www.enterprisedb.com
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Avoid-gettext-0.20-performance-bug-on-Windows.patch | text/plain | 1.9 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Shinya Kato | 2025-12-10 01:01:52 | Re: Add mode column to pg_stat_progress_vacuum |
| Previous Message | Alexander Korotkov | 2025-12-09 23:54:00 | Re: Add SPLIT PARTITION/MERGE PARTITIONS commands |