| From: | Bryan Green <dbryan(dot)green(at)gmail(dot)com> |
|---|---|
| To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
| Subject: | Re: Can we get rid of TerminateThread() in pg_dump? |
| Date: | 2026-07-03 13:55:40 |
| Message-ID: | fc4d8e83-0057-450a-917c-d8c2c79b336b@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 7/2/2026 10:32 PM, Thomas Munro wrote:
> Hi,
>
> Following up on this ancient discussion and resulting commit e652273e...
>
> https://www.postgresql.org/message-id/flat/11515.1464961470%40sss.pgh.pa.us#3e78a2af445dd7e566cf499023e8cb97
>
> write(fd, ...) locks fd's entry in a user space descriptor table on
> Windows, so if the terminated thread was also writing to stderr, I am
> pretty sure it would hang. Someone with Windows might be able to
> repro that by hacking the workers to write to stderr a lot?
> WriteFile(_get_osfhandle(STDERR_FILENO), ...) might avoid that
> specific issue, but for all I know that's just the tip of the iceberg
> considering the socket stuff.
>
> Apparently this hasn't been a problem in practice. Error reporting
> coinciding with ^C must be unlikely? I'd still like to find a
> non-evil way to suppress log output, though. What if we atomically
> pointed STDERR_FILENO to /dev/null, with dup2()? I wrote a patch to
> try that idea out, but I don't have Windows, so I'm sharing this as a
> curiosity in case anyone wants to try it and/or comment on all this.
> Or has a better idea. Preferably that would work on Windows and Unix
> (if it also used threads).
>
> (How I arrived at this obscure hypothetical problem: I've been
> teaching pg_dump (and everything else) to use threads on Unix too, ie
> harmonising Windows and Unix code paths. We certainly don't want
> thread termination/cancellation in pg_threads.h (modern APIs don't
> even have that, designers of older APIs regret it, including the
> Windows people who are quite emphatic about there being no safe way to
> use it[1]), so I wanted to find another way to silence error output.
> close() would kinda work but cause chaos if another open() squats the
> fd number, which leads to the idea of dup2(dummy, STDERR_FILENO).
> Then I wondered if that would also work on Windows' fake fd system, ie
> whether dup2(dummy, STDERR_FILENO) would play nicely with concurrent
> write(STDERR_FILENO) or fwrite(stderr) as the file is switched. Yes,
> as far as I can tell, but as soon as I read about the user space fd
> locks that make that work, the above concrete problem with write() vs
> TerminateThread() became clear.)
>
> [1] https://devblogs.microsoft.com/oldnewthing/20150814-00/?p=91811
I build pg_dump on Windows, so I tried this out (MSVC 19.51, VS 2026).
It works. It compiles clean, and on a ^C during a parallel dump it
suppresses the worker cancel messages the same way the TerminateThread
call did-- I checked against unmodified master with the TerminateThread
call still in place, and the output is identical-- "terminated by user"
and nothing else.
Moving write_stderr off fileno(stderr) to STDERR_FILENO is worth having
on its own: that shim's comment notes _fileno returns -1 when the stream
is closed, which is a state the shutdown path can be in.
Opening NUL in the handler worked here; I didn't need to open it in advance.
Since you mention moving parallel.c to threads on non-Windows, I've been
replacing the socketpair/select() worker protocol on Windows with an
in-process queue (a mutex and two condition variables)[1], heading the
same direction to unify across platforms. There a worker signals its own
death through the condition variable rather than the leader seeing pipe
EOF, so ^C hits an extra pg_fatal("a worker process died unexpectedly")
in the leader-- and your redirect swallows that one too. I put your
patch on top of that series and hit ^C on a 2.3 GB dump repeatedly at -j
2, 4, 8, and 100: clean exit every time, "terminated by user", nothing
else. pg_stat_activity right afterward showed no live COPYs, up through
-j 100-- so the PQcancel()s land and the backends actually stop, which
TerminateThread never did. So your change composes with that work.
[1]
https://www.postgresql.org/message-id/flat/8c712d76-ecf7-4749-a6d8-dddc01f298ec%40gmail.com
--
Bryan Green
EDB: https://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Florin Irion | 2026-07-03 13:56:53 | Re: psql tab completion for user functions and if explicitly required also "pg_" |
| Previous Message | Florin Irion | 2026-07-03 13:32:56 | Re: pg_plan_advice: add NO_ scan and join method tags |