Re: pgsql: Add putenv support for msvcrt from Visual Studio 2013

From: Noah Misch <noah(at)leadboat(dot)com>
To: Christian Ullrich <chris(at)chrullrich(dot)net>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, michael(dot)paquier(at)gmail(dot)com
Subject: Re: pgsql: Add putenv support for msvcrt from Visual Studio 2013
Date: 2016-11-16 07:22:46
Message-ID: 20161116072246.GA2297058@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Tue, Apr 26, 2016 at 07:39:29PM +0200, Christian Ullrich wrote:
> * Christian Ullrich wrote:
>
> >wrong even without considering the debug/release split. If we load a
> >compiled extension built with a CRT we have not seen yet, _after_ the
> >first call to pgwin32_putenv(), that module's CRT's view of its
> >environment will be frozen because we will never attempt to update it.
>
> Four patches attached:
>
> master --- 0001 --- 0002 --- 0003
> \
> `- 0004
>
> 0001 is just some various fixes to set the stage.

Patch 1 looks good, except that it should be three patches:

- cosmetic parts: change whitespace and s/0/NULL/
- remove CloseHandle() call
- probe for debug CRT modules, not just release CRT modules

Please split along those boundaries. I plan to back-patch all of that. I've
seen some gettext builds mysteriously ignore "SET lc_messages = ..."; ignoring
debug CRTs could have been the cause.

> I tested this with a project
> (<https://bitbucket.org/chrullrich/pgputenv-demo>) that contains two DLLs:

That's a pithy test; thanks for assembling it.

> Even with patch 0004, there is still a race condition between the main
> thread and a theoretical additional thread created by some other component
> that unloads some CRT between GetProcAddress() and the _putenv() call, but
> that is hardly fixable.

I think you can fix it by abandoning GetModuleHandle() in favor of
GetModuleHandleEx() + GetProcessAddress() + FreeLibrary(). I recommend also
moving the SetEnvironmentVariable() call before the putenv calls. That way,
if a CRT loads while pgwin32_putenv() is executing, the newly-loaded CRT will
always have the latest value. (I'm assuming CRTs populate their environment
from GetEnvironmentStrings(), because I can't think of an alternative.)

As a separate patch, I am inclined to remove the "#ifdef _MSC_VER" directive,
activating its enclosed code under all compilers. A MinGW-built postgres.exe
has the same need to update all CRTs.

> The fact that master looks like it does means that there have not been many
> (or any) complaints about missing cross-module environment variables. If
> nobody ever needs to see a variable set elsewhere, we have a very simple
> solution: Why don't we simply throw out the whole #ifdef _MSC_VER block?

pgwin32_putenv() originated, in commit 0154345, to make "SET lc_messages =
..." effective when gettext uses a different CRT from postgres.exe. I suspect
it also makes krb_server_keyfile effective when GSS uses a different CRT.
Those are achievements worth keeping. I'm not surprised about the lack of
complaints, because environment variables don't often change after backend
startup. Here are some ways one could notice the difference between master
and patches 2+3 or 2+4:

- Use shared_preload_libraries to load a module that reads LC_CTYPE or
LC_COLLATE. CheckMyDatabase() sets those variables subsequent to
process_shared_preload_libraries().

- Load, at any time, a module that reads LC_MESSAGES. There's little reason
to read that variable outside of gettext. A module could use a gettext DLL
other than the postgres.exe gettext DLL, but such a module would need to
work around pg_bindtextdomain() always using the postgres.exe gettext.

- Load, at any time, a module that itself changes environment variables, other
than LC_MESSAGES, after backend startup. I suspect PL/Python suffices.

Those are plausible scenarios, but they're sufficiently specialized that
problems could lie unnoticed or undiagnosed for years. I lean against
back-patching anything from patches 2, 3 or 4.

> There is another possible fix, ugly as sin, if we really need to keep the
> whole environment update machinery *and* cannot do the full loop each time.
> Patch 0003 pins each CRT when we see it for the first time.
> GET_MODULE_HANDLE_EX_FLAG_PIN is documented as "The module stays loaded
> until the process is terminated, no matter how many times FreeLibrary is
> called", so the unload race cannot occur anymore.

I prefer the simplicity of abandoning the cache (patch 4), if it performs
decently. Would you compare the performance of patch 1, patches 1+2+3, and
patches 1+2+4? This should measure the right thing (after substituting
@libdir@ for your environment):

CREATE FUNCTION putenv(text)
RETURNS void
AS '@libdir@/regress.dll', 'regress_putenv'
LANGUAGE C STRICT;
\timing on
SELECT count(putenv('foo=' || n)) FROM generate_series(1,1000000) t(n);

(I'm interested for the sake of backend startup time. I recall nine putenv()
in every backend startup, seven in main() and two in CheckMyDatabase().)

Thanks,
nm

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Magnus Hagander 2016-11-16 09:38:19 Re: [COMMITTERS] pgsql: Build HTML documentation using XSLT stylesheets by default
Previous Message Peter Eisentraut 2016-11-16 07:06:52 pgsql: Build HTML documentation using XSLT stylesheets by default

Browse pgsql-hackers by date

  From Date Subject
Next Message Okano, Naoki 2016-11-16 07:31:00 Re: Adding the optional clause 'AS' in CREATE TRIGGER
Previous Message Peter Eisentraut 2016-11-16 07:06:52 pgsql: Build HTML documentation using XSLT stylesheets by default