From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | buschmann(at)nidsa(dot)net, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use Old PostgreSQL community/pgsql-bugs x |
Date: | 2019-02-20 02:21:05 |
Message-ID: | CA+hUKGKpQJCWcgyy3QTC9vdn6uKAR_8r__A-MMm2GYfj45caag@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On Tue, Feb 19, 2019 at 7:31 AM PG Bug reporting form
<noreply(at)postgresql(dot)org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference: 15641
> Logged by: Hans Buschmann
> Email address: buschmann(at)nidsa(dot)net
> PostgreSQL version: 11.2
> Operating system: Windows Server 2019 Standard
> Description:
>
> I recently moved a production system from PG 10.7 to 11.2 on a different
> Server.
>
> The configuration settings where mostly taken from the old system and
> enhanced by new features of PG 11.
>
> pg_prewarm was used for a long time (with no specific configuration).
>
> Now I have added Huge page support for Windows in the OS and verified it
> with vmmap tool from Sysinternals to be active.
> (the shared buffers are locked in memory: Lock_WS is set).
>
> When pg_prewarm.autoprewarm is set to on (using the default after initial
> database import via pg_restore), the autoprewarm worker process
> terminates immediately and generates a huge number of logfile entries
> like:
>
> CPS PRD 2019-02-17 16:11:53 CET 00000 11:> LOG: background worker
> "autoprewarm worker" (PID 3996) exited with exit code 1
> CPS PRD 2019-02-17 16:11:53 CET 55000 1:> ERROR: could not map dynamic
> shared memory segment
Hmm. It's not clear to me how using large pages for the main
PostgreSQL shared memory region could have any impact on autoprewarm's
entirely separate DSM segment. I wonder if other DSM use cases are
impacted. Does parallel query work? For example, the following
produces a parallel query that uses a few DSM segments:
create table foo as select generate_series(1, 1000000)::int i;
analyze foo;
explain analyze select count(*) from foo f1 join foo f2 using (i);
Looking at the place where that error occurs, it seems like it simply
failed to find the handle, as if it didn't exist at all at the time
dsm_attach() was called. I'm not entirely sure how that could happen
just because you turned on huge pages. Is it possible that there is a
race where apw_load_buffers() manages to detach before the worker
attached, and the timing changes? At a glance, that shouldn't happen
because apw_start_database_worker() waits for the work to exit before
returning.
I think we'll need one of our Windows-enabled hackers to take a look.
PS Sorry for breaking the thread. I wish our archives app had a
"[re]send me this email" button, for people who subscribed after the
message was sent...
--
Thomas Munro
https://enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Etsuro Fujita | 2019-02-20 02:30:25 | Re: BUG #15642: UPDATE statements that change a partition key and FDW partitions problem. |
Previous Message | Michael Paquier | 2019-02-20 02:09:11 | Re: Segmentation Fault in logical decoding get/peek API |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2019-02-20 02:41:04 | Re: Reaping Temp tables to avoid XID wraparound |
Previous Message | Amit Langote | 2019-02-20 01:57:29 | Re: Another way to fix inherited UPDATE/DELETE |